Topic Models for Unsupervised Discovery of Viewpoints on the Web
1. Topic Models for Unsupervised Discovery
of Viewpoints on the Web
Modèles thématiques pour la découverte non supervisée
de points de vue sur le Web
PhD defense
Thibaut THONET
Advised by Guillaume CABANAC, Karen PINEL-SAUVAGNAT, Mohand BOUGHANEM
23 November 2017
2. Talk outline
1. Introduction
2. Literature Review
3. C1: Viewpoint Discovery in Text Documents
4. C2: Viewpoint Discovery in Social Networks
5. Conclusion
2 / 31
4. Introduction
‘Traditional’ opinion mining
Massive amount of opinions on the Web
=⇒ Need for automated methods to identify,
classify and summarize opinions
Traditional opinion mining research mainly focused on product/service review analysis
=⇒ Identification of a review’s polarity w.r.t. a target: positive/negative
Images and reviews taken from Wikipedia and Amazon.com, February 2016.
4 / 31
5. Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions
E.g., to deal with filter bubbles [Pariser, 2011] & echo chambers [Sunstein, 2009]
Image taken from wired.com, October 2017.
5 / 31
6. Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
A viewpoint is defined as the position adopted by a group of people on a given issue (e.g.,
related to policy, society or economy) and underlies a set of specific values, beliefs or principles
Image taken from social.rollins.edu, November 2017.
5 / 31
7. Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
Application: to build an argument map to help decision makers and the general public
Image taken from shale-gas-information-platform.org, March 2016.
5 / 31
8. Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
Application: to build an argument map to help decision makers and the general public
Image taken from shale-gas-information-platform.org, March 2016.
5 / 31
9. Introduction
Challenges
Challenges compared to traditional opinion mining:
Viewpoints expressed in a more subtle way
than reviews’ polar opinions (“I like”, “I hate”)
and more domain dependent. . .
. . . Opinion lexicons then less useful for
viewpoint mining
Domain knowledge is not always available
and costly to gather
=⇒ Need for unsupervised approaches
E.g., latent variable models / topic models a la
Latent Dirichlet Allocation [Blei+, NIPS ’01]
Images taken from Wikipedia and jewishjournal.com, February 2016.
6 / 31
11. Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
Identification of viewpoint-specific discourse given (known) documents’ viewpoints:
Cross-cultural topic model by [Paul+, EMNLP ’09] to study culture-specific discourse
Cross-perspective topic model by [Fang+, WSDM ’12] based on part-of-speech to partition topical
words (topic-specific words) and opinion words (viewpoint/topic-specific words)
8 / 31
12. Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
Learning documents’ viewpoint assignments based on text content:
Topic-Aspect Model by [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint model by [Trabelsi+, ICDM ’14] to extract arguing expressions
8 / 31
13. Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
Our first contribution [Thonet+, ECIR ’16] investigates the utility of topical/opinion
word partitioning based on part-of-speech to learn documents’ viewpoint
assignments
8 / 31
14. Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
[Joshi+, WASSA@NAACL ’16]
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
[Barberá, Polit. Anal. ’15]
Ours [Thonet+, CIKM ’17]
Identifying users’ viewpoints in social media data:
Viewpoint modeling in forum posts by [Qiu+, NAACL ’13] and [Qiu+, CIKM ’13] based on a topic
model that leverages post reply information
Political affiliation (≈ viewpoint) prediction of Twitter users in [Joshi+, WASSA@NAACL ’16] based
on tweet content and part-of-speech but not social interactions
9 / 31
15. Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
[Joshi+, WASSA@NAACL ’16]
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
[Barberá, Polit. Anal. ’15]
Ours [Thonet+, CIKM ’17]
Community detection in social networks:
Social Network Latent Dirichlet Allocation by [Sachan+, WSDM ’14] to discover communities
(≈ viewpoints) in social networks based on text content and social interactions
Similar model to SN-LDA in [Liu+, SDM ’14] but non-parametric and dynamic
9 / 31
16. Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
[Joshi+, WASSA@NAACL ’16]
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
[Barberá, Polit. Anal. ’15]
Ours [Thonet+, CIKM ’17]
Ideal point model by [Barberá, Polit. Anal. ’15] to identify Twitter users’ (real-valued)
ideology (≈ viewpoint) based on follow interactions
9 / 31
17. Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]
[Fang+, WSDM ’12]
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
Ours [Thonet+, ECIR ’16]
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
[Joshi+, WASSA@NAACL ’16]
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
[Barberá, Polit. Anal. ’15]
Ours [Thonet+, CIKM ’17]
Our second contribution [Thonet+, CIKM ’17] proposes to identify users’ viewpoints
in social networks based on both text content and social interactions
9 / 31
19. C1: Viewpoint Discovery in Text Documents
Task
Discover topics and viewpoints from documents based on text content
11 / 31
20. C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
12 / 31
21. C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Topical words (topic-dependent) and opinion words (viewpoint/topic-dependent) partitioning
Inspired by opinion/viewpoint mining works: e.g., [Turney, ACL ’02], [Fang+, WSDM ’12]
Partition based on part-of-speech
=⇒ A word w is a topical word if its part-of-speech category x is 0 (noun) or an opinion
word if its part-of-speech category x is 1 (adjective, verb, adverb...)
12 / 31
22. C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Sentence-level topic assignments z instead of word-level to better align topical words and
opinion words with the sentence’s topic
Document-level viewpoint assignments v: an opinionated document is usually written by
one author, i.e., according to one viewpoint
Viewpoint-specific topic distributions θ instead of document-specific: [Qiu+, NAACL ’13]
observed that different viewpoints have different dominating topics
12 / 31
23. C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Approximate posterior inference using collapsed Gibbs sampling
Dirichlet distributions θ, π, φ0, φ1 integrated out
Successively sample discrete latent variables z, v from their posterior distributions (i.e., given
observations w, x)
=⇒ Provides distributions’ estimators ˆθ, ˆπ, ˆφ0, ˆφ1 and assignments v and z for all docs/sentences
12 / 31
24. C1: Viewpoint Discovery in Text Documents
Experimental setup
Hyperparameters set to fixed values: α = 0.01, β0 = β1 = 0.01, η = 100
Evaluation based on the Bitterlemons collection (http://www.bitterlemons.net/),
introduced by [Lin+, CoNLL ’06], containing essays about the Israeli-Palestinian conflict
=⇒ Number of viewpoints V set to 2
Total number
of documents
Number of essays written
by Israeli authors
Number of essays written
by Palestinian authors
594 297 297
Viewpoint clustering performance measured in terms of Accuracy (≈ to what extent obtained
clusters and groundtruth classes overlap)
13 / 31
25. C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-D
14 / 31
26. C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-O
14 / 31
27. C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-W
14 / 31
28. C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-S
14 / 31
29. C1: Viewpoint Discovery in Text Documents
Evaluation: viewpoint clustering
Clustering of document-level Israeli/Palestinian viewpoints (each boxplot drawn from 50 chains)
Higher accuracy = better clustering performance
VODUM TAM, JTV, LDA: VODUM overall beats
state-of-the-art baselines
TAM JTV, LDA: TAM performs the best among
state-of-the-art baselines
VODUM VODUM-D: viewpoint-specific
topic distributions slightly improve accuracy
VODUM VODUM-O: opinion and topical words
partitioning considerably improves accuracy
VODUM VODUM-W: sentence-level topic
assignments considerably improve accuracy
VODUM VODUM-S: document-level viewpoint
assignments slightly improve accuracy
VODUM
TAM
JTV
LDA
VODUM−D
VODUM−O
VODUM−W
VODUM−S
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
Models
Accuracy
15 / 31
30. C1: Viewpoint Discovery in Text Documents
Evaluation: qualitative analysis
Most probable topical/opinion words for a topic (manually annotated as “Middle East conflicts”)
Middle East conflicts
Topical words
israel palestinian syria jihad war iraq dai suicid destruct iran
Middle East conflicts
Opinion words (I)
islam isra terrorist recent militari intern like heavi close american
Middle East conflicts
Opinion words (P)
need win think sai don strong new sure believ commit
Topical words are unbiased towards a viewpoint and clearly reflect Middle East conflicts:
e.g., syria, war, iraq and destruct
Coherent opinion words for the Israeli viewpoint: e.g., islam, terrorist and american
Palestinian viewpoint remains non-specific about conflicts and do not mention islam or
terrorism: e.g., win, strong and commit
16 / 31
31. C2: Viewpoint Discovery in Social Networks
C2: Viewpoint Discovery in Social Networks
17 / 31
32. C2: Viewpoint Discovery in Social Networks
Task
Discover topics and viewpoints from social networking data, leveraging both posted text
content and social interactions between users
18 / 31
33. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
19 / 31
34. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
19 / 31
35. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
Observed data: tokens occurring
in documents posted by users
=⇒ 3 nested plates
Latent topics assigned to each
token
Latent viewpoints assigned at
document-level
19 / 31
36. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
Following the Topic-Aspect Model
from [Paul+, AAAI ’10], definition of
four word types specified by switch
variables (level) and x (route):
Background words
=⇒ = 0, x = 0
Viewpoint words
=⇒ = 0, x = 1
Topic words
=⇒ = 1, x = 0
Viewpoint-topic words
=⇒ = 1, x = 1
19 / 31
37. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
19 / 31
38. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
Outgoing interactions for user u =
interactions initiated by u on another
user (recipient r)
I #GOP
RT
Following SN-LDA from [Sachan+,
WSDM ’14], viewpoints assigned to
outgoing interactions (homophily)
19 / 31
39. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
. . . But outgoing interactions
insufficient for some users
I #GOP
RT
RT
@
=⇒ We propose to also exploit
incoming interactions
19 / 31
40. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
Incoming interactions for user u =
interactions initiated by another user
(sender s) on u
I #GOP
RT
Viewpoint assigned to the document
being interacted upon
19 / 31
41. C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Approximate inference based on
Collapsed Gibbs Sampling
Dirichlet/Bernoulli distributions σ,
ψ, θ, π, φ, ξ integrated out
Successively sample discrete
latent variables , x, z, v, v from
their posterior distributions (i.e.,
given observations w, s, r)
Hyperparameters δ, γ, α, η, µ
sampled according to the
auxiliary variable technique
following [Newman+, J. Mach.
Learn. Res. ’09] and β fixed to
0.01
19 / 31
42. C2: Viewpoint Discovery in Social Networks
Limits of SNVDM’s social interaction component
Some users have very few social interactions
=⇒ Difficult to identify their viewpoints based on scarce direct interactions
20 / 31
43. C2: Viewpoint Discovery in Social Networks
Limits of SNVDM’s social interaction component
We propose to extend SNVDM to leverage “aquaintances of acquaintances” (≈ friends of friends)
How? =⇒ Generalized Pólya Urn scheme
20 / 31
44. C2: Viewpoint Discovery in Social Networks
SNVDM-GPU: extension of SNVDM based on Generalized Pólya Urn
Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling
E.g., for outgoing interaction o from user u on user u :
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + µ 1
U
nv• + µ
SNVDM vs . . .
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + u ∈R
u τ
λ nvu + µ 1
U
nv• + U
u =1 u ∈R
u τ
λ nvu + µ
. . . SNVDM-GPU
21 / 31
45. C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
Dataset
#Users
#Tweets #Tokens Vocabulary #Interactions
Yes/Dem. No/Rep.
Indyref 589 575 270,075 2,043,204 38,942 696,654
Midterms 767 778 113,545 975,199 25,312 241,741
22 / 31
46. C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
=⇒ Only text content to discover viewpoints and topics
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
=⇒ Text content and outgoing interactions to discover communities (≈ viewpoints) and topics
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
=⇒ Text content to discover viewpoints and topics, and parts of speech to distinguish between
topic words and viewpoint-topic words
22 / 31
47. C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
SNVDM vs . . . . . . SNVDM-WI
22 / 31
48. C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
Proposed models:
SNVDM
SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns
SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns
22 / 31
49. C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
Proposed models:
SNVDM
SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns
SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns
Viewpoint clustering performance measured in terms of Purity (≈ to what extent obtained
clusters are homogeneous) and Normalized Mutual Information (information theoretic
clustering measure)
22 / 31
50. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 1: consistent results across different numbers of topics
23 / 31
51. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 2: SNVDM, SNVDM-GPU (τ = 10), SNVDM-GPU (τ = ∞) all baselines
23 / 31
52. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 3: SN-LDA TAM, VODUM =⇒ interactions ↑↑↑
23 / 31
53. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 4: SNVDM SNVDM-WI =⇒ incoming interactions ↑↑
23 / 31
54. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 5: SNVDM-GPU (τ = ∞) SNVDM-GPU (τ = 10) SNVDM =⇒ GPU ↑
23 / 31
55. C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Midterms
Observation 6: similar trends on Midterms but greater improvement for our models over baselines
23 / 31
56. C2: Viewpoint Discovery in Social Networks
Evaluation: impact of social network sparsity
Clustering of users’ viewpoints on Indyref for different degrees of network sparsity (T = 10)
Observation: performance degraded for lower percentage of interactions
24 / 31
57. C2: Viewpoint Discovery in Social Networks
Evaluation: qualitative analysis
Most probable topic words and viewpoint-topic words for topics from Indyref and Midterms
Topic: Scottish independence
Neutral Viewpoint: Yes Viewpoint: No
#indyref #voteyes #indyref
scotland yes uk
independence scotland salmond
vote independence #bettertogether
campaign westminster #scotdecides
scottish vote separation
uk independent currency
people country thanks
future #yes today
independent #scotland say
Topic: Energy and resources
Neutral Viewpoint: Dem. Viewpoint: Rep.
energy #actonclimate #4jobs
house climate #obamacare
new #p2 #jobs
gas change gop
natural #climatechange obama
#energy clean bills
#ff oil jobs
#kxl energy house
support #gop act
economic seec watch
Reasonable coherence of topic words and viewpoint-topic words
Topic words indeed unbiased towards any viewpoints
Use of viewpoint-specific hashtags and mention of different issues for different viewpoints
25 / 31
59. Conclusion
Summary of contributions
VODUM discovers viewpoints and topics
in text documents, exploiting parts of
speech to distinguish between topical
words and opinion words [Thonet+,
ECIR ’16]
Lessons learned: opinion and topical
words partitioning ↑↑↑, sentence-level
topic assignments ↑↑↑
SNVDM(-GPU) discovers viewpoints and
topics in social networks, leveraging
both posted text content and social
interactions [Thonet+, CIKM ’17]
Lessons learned: social interactions ↑↑↑
27 / 31
60. Conclusion
Perspectives
Integrate time dimension and geolocation, e.g., to analyze party support during elections
Model viewpoints as real-valued variables to better capture nuanced opinions
Design a viewpoint summarization framework to build argument maps and help mitigate
the filter bubble and echo chamber phenomenon
28 / 31
62. Conclusion
References
Barberá, P. (2015). Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using
Twitter Data. Polit. Anal., 23(1), 76–91.
Blei, D. M., Ng, A. Y., Jordan, M. I. (2001). Latent Dirichlet Allocation. In Proc. of NIPS ’01 (pp. 601–608).
Brigadir, I., Greene, D., Cunningham, P. (2015). Analyzing Discourse Communities with Distributional
Semantic Models. In Proc. of WebSci ’15.
Fang, Y., Si, L., Somasundaram, N., Yu, Z. (2012). Mining Contrastive Opinions on Political Texts using
Cross-Perspective Topic Model. In Proc. of WSDM ’12 (pp. 63–72).
Joshi, A., Bhattacharyya, P., Carman, M. (2016). Political Issue Extraction Model: A Novel Hierarchical
Topic Model That Uses Tweets By Political And Non-Political Authors. In Proc. of WASSA@NAACL-HLT
’16 (pp. 82–90).
Lin, W.-H., Wilson, T., Wiebe, J., Hauptmann, A. (2006). Which Side are You on? Identifying
Perspectives at the Document and Sentence Levels. In Proc. of CoNLL ’06 (pp. 109–116).
Liu, Z., Zheng, Q., Wang, F., Tian, Z., Li, B. (2014). A Dynamic Nonparametric Model for Characterizing
the Topical Communities in Social Streams. In Proc. of SDM ’14 (pp. 379–387).
Newman, D., Asuncion, A., Smyth, P., Welling, M. (2009). Distributed Algorithms for Topic Models. J. of
Mach. Learn. Res., 10, 1801–1828.
Paul, M. J., Girju, R. (2010). A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted
Topics. In Proc. of AAAI ’10 (pp. 545–550).
30 / 31
63. Conclusion
References (continued)
Paul, M. J., Zhai, C., Girju, R. (2010). Summarizing Contrastive Viewpoints in Opinionated Text. In Proc.
of EMNLP ’10 (pp. 66–76).
Qiu, M., Jiang, J. (2013). A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts.
In Proc. NAACL-HLT ’13 (pp. 1031–1040).
Qiu, M., Yang, L., Jiang, J. (2013). Modeling Interaction Features for Debate Side Clustering. In Proc. of
CIKM ’13 (pp. 873–878).
Sachan, M., Dubey, A., Srivastava, S., Xing, E. P., Hovy, E. (2014). Spatial Compactness meets Topical
Consistency: Jointly Modeling Links and Content for Community Detection. In Proc. of WSDM ’14 (pp.
503–512).
Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K. (2016). VODUM: A Topic Model Unifying
Viewpoint, Topic and Opinion Discovery. In Proc. of ECIR ’16 (pp. 533–545).
Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K. (2017). Users Are Known by the
Company They Keep: Topic Models for Viewpoint Discovery in Social Networks. In Proc. of CIKM ’17 (pp.
87-96).
Trabelsi, A., Zaiane, O. R. (2014). Mining Contentious Documents Using an Unsupervised Topic Model
Based Approach. In Proc. of ICDM ’14 (pp. 550–559).
Turney, P. D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews. In Proc. of ACL ’02 (pp. 417–424).
31 / 31
64. Conclusion
Appendix: clustering metrics
Given the groundtruth classes S = {S1, S2}, the obtained clusters C = {C1, C2}, and the
document collection D:
Acc(C, S) =
1
|D|
max |C1 ∩ S1| + |C2 ∩ S2|, |C1 ∩ S2| + |C2 ∩ S1|
Purity(C, S) =
1
|D|
max |C1 ∩ S1|, |C1 ∩ S2| + max |C2 ∩ S1|, |C2 ∩ S2|
NMI(C, S) =
2 I(C, S)
H(C) + H(S)
with I(C, S) =
j,k
|Cj ∩ Sk|
|D|
log
|D| |Cj ∩ Sk|
|Cj| |Sk|
and H(C) = −
j
|Cj|
|D|
log
|Cj|
|D|
31 / 31
65. Conclusion
Appendix: perplexity analysis for VODUM and baselines
Held-out perplexity computed through 10-fold cross validation for T ∈ {5, 10, 15, 20, 30, 50}
Lower perplexity = better generalization
performance
TAM LDA for small number of topics
( 20) and LDA TAM for large
number of topics ( 20)
JTV TAM, LDA for all number of
topics
VODUM TAM, JTV, LDA for all
number of topics
! VODUM’s vocabulary is different
with that of TAM, JTV and LDA:
partitioning of topical words and
opinion words
10 20 30 40 50
400500600700800900
Number of topics (T)
Averageperplexity
Models
VODUM
TAM
JTV
LDA
31 / 31
66. Conclusion
Appendix: execution time for SNVDM and baselines
Execution time (in seconds) of one Gibbs sampling iteration on Indyref (with T = 10) and
Midterms (with T = 15)
Indyref Midterms
TAM 1.45 0.87
SN-LDA 1.18 0.64
VODUM 2.78 1.85
SNVDM-WII 2.08 1.08
SNVDM 2.49 1.15
SNVDM-GPU (τ = 10) 3.47 1.34
SNVDM-GPU (τ = ∞) 14.67 2.56
31 / 31
67. Conclusion
Appendix: Simple Pólya Urn scheme
The compound Dirichlet-Multinomial distribution (used in LDA-based topic models) can be
interpreted as an urn sampling metaphor with an over-replacement policy
+
Urn
2. Duplicate the drawn ball
3. Put back in the urn the original ball
and its duplicate
Infinite ball generator
1. Randomly draw a ball from the urn
31 / 31
68. Conclusion
Appendix: Generalized Pólya Urn scheme
The Simple Pólya Urn scheme can be generalized by modifying the replacement rule to exploit
similarities between balls’ colors [Mahmoud, 2008]
+
Urn
2. Duplicate the drawn ball and
generate parts of balls for those
similar to the drawn ball
3. Put back in the urn the original ball,
its duplicate and the parts of similar
balls
Infinite ball generator
+ +
1. Randomly draw a ball from the urn
31 / 31
69. Conclusion
Appendix: SNVDM-GPU
Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling
E.g., for outgoing interaction o from user u on user u :
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + µ 1
U
nv• + µ
SNVDM vs . . .
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
U
u =1
Au u nvu + µ 1
U
U
u =1
Au • nvu + µ
. . . SNVDM-GPU
The addition matrix A defines the weight to put on count
nvu for each u :
Au u =
1 if u = u ,
λ if u is among top τ acquaintances of u ,
0 otherwise
with 0 ≤ λ ≤ 1 (λ = 0 =⇒ “vanilla” SNVDM) and τ ∈ N
weaker link
31 / 31