Advertisement

Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topic Models

TimDraws
Dec. 11, 2020
Advertisement

More Related Content

Similar to Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topic Models(20)

Advertisement

Recently uploaded(20)

Advertisement

Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topic Models

  1. 1 WIS Web Information Systems Helping users discover perspectives Enhancing opinion mining with joint topic models Tim Draws, Jody Liu, Nava Tintarev TU Delft, The Netherlands t.a.draws@tudelft.nl https://timdraws.net
  2. 2 WIS Web Information Systems Discovering perspectives
  3. 3 WIS Web Information Systems Discovering perspectives Unstructured set of textual opinions ? Perspective 1 Supporting Perspective 2 Perspective 3 Perspective 4 Perspective 5 Perspective 6 Perspective 7 Perspective 8 Perspective 9 Perspective 10 Opposing Structured set of perspectives
  4. 4 WIS Web Information Systems Topic models • Topic model = unsupervised model to discover hidden structures (i.e., topics) in corpora of text – Example: Latent Dirichlet Allocation (LDA) [1] – Topics are probability distributions over words – If applied to a corpus of documents related to a debate, topics could be interpreted as perspectives • Joint topic model = adding additional components (e.g., sentiment analysis) to a classical topic model (e.g., LDA)
  5. 5 WIS Web Information Systems Our paper RQ1. Can joint topic models support users in discovering perspectives in a corpus of opinionated documents? RQ2. Do users interpret the output of joint topic models in line with their personal pre-existing stance? Contributions: 1. Perspective-annotated data set 2. User study
  6. 6 WIS Web Information Systems Data Document Stance Perspective You cannot be a Christian and support abortion… Against Abortion is the killing of a human being, which defies the word of God. No one in the world has any right to judge over what someone else does with their body, … For Reproductive choice empowers women by giving them control over their own bodies. Why put a child through the pain of an unloving mother… For A baby should not come into the world unwanted. … … … Final data set: 600 documents; 6 perspectives
  7. 7 WIS Web Information Systems Experimental setup 1 2 3 4 5 • Ran each model on the final data set (i.e., for 6 topics) • Between-subjects study: each participant sees output of one of the models • Participants need to identify the correct 6 perspectives from the model output
  8. 8 WIS Web Information Systems Procedure Step 1 Step 2 Step 3 Participants state: • Age • Gender • Personal stance towards abortion • Familiarity with the abortion debate Participants state: • Perceived usefulness • Perceived awareness increase • Confidence in task performance
  9. 9 WIS Web Information Systems Results: descriptive • 158 participants (recruited from Prolific) – After excluding 12 participants due to failing both honeypot topics – 150 required according to power analysis • 50.6% female, 49.4% male • 33.3 years old on average (range 18 to 64) • Most (57.8%) at least somewhat familiar with the topic • Sample skewed towards the supporting viewpoint
  10. 10 WIS Web Information Systems Results: hypothesis tests H1: Users find more correct perspectives when being exposed to the output of a joint topic model compared to the output of a regular topic model or baseline. – We find a difference between models (p < 0.001, η2 = 0.126) – TAM is the only one that performs significantly better than the baseline 3 4 5 TF−IDF LDA JST VODUM TAM LAM Model MeannCor
  11. 11 WIS Web Information Systems Results: hypothesis tests H2: Users are more likely to identify sets of keywords as perspectives that are in line with their personal stance compared to perspectives that they do not agree with. – No evidence for for such a relationship (ρ = 0.122, p = 0.163)
  12. 13 WIS Web Information Systems Discussion and future work • Why did TAM perform better? – It extracted more keywords that appeared explicitly in the perspective expression Abortion is the killing of a human being, which defies the word of God. Reproductive choice empowers women by giving them control over their own bodies. A baby should not come into the world unwanted. • Future work: different domains, novel topic models
  13. 14 WIS Web Information Systems Take home • Joint topic models such as TAM can perform perspective discovery • No evidence for tendency of users to interpret output in line with their personal stance • Implications for several areas: journalism, policy-making, generating explanations (All supplementary materials are openly available at https://osf.io/uns63/.)
  14. 15 WIS Web Information Systems References [1] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning Research, vol. 3, pp. 993– 1022, 05 2003. [2] M. Paul and R. Girju, “A two-dimensional topic-aspect model for discovering multi-faceted topics.” in AAAI, vol. 1, 01 2010. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. 226.3550&rep=rep1&type=pdf [3] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,” in Proceedings of the 18th ACM Conference on Information and Knowledge Management, ser. CIKM ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 375–384. [Online]. Available: https://doi.org/10.1145/1645953.1646003 [4] T. Thonet, G. Cabanac, M. Boughanem, and K. Pinel-Sauvagnat, “Vodum: A topic model unifying viewpoint, topic and opinion discovery,” in ECIR, vol. 9626. Toulouse, France: Springer, 03 2016, pp. 533– 545. [5] D. Vilares and Y. He, “Detecting perspectives in political debates,” in EMNLP. Association for Computational Linguistics, 01 2017, pp. 1573–1582.

Editor's Notes

  1. Introduce myself Second year PhD
  2. Imagine you are a journalist writing an article about the abortion debate Abortion is a commonly debated topic with many people on both sides; and many perspectives Explain stance-perspective difference Naturally these debates are carried out online in news, social media, and fora For you as a journalist it would be great to have an automatic way to distil these perspectives
  3. Formalize What existing techniques could be used here? Sentiment analysis and stance detection no good because supervised Perspectives are unstructured and different for every topic  unsupervised
  4. In sum, two research questions To answer them, we Created a data set (openly available) Conducted a user study showing that some joint topic models can perform perspective discovery
  5. Needed data set of opinionated documents with perspective annotation Documents: around 3000 debate forum posts on abortion Human annotator noted stance and perspective Perspectives taken from ProCon list of 31 Then balanced data set of 600 documents
  6. First describe joint topic models, then baselines Ran all these models on the final corpus and then conducted user study with their output
  7. Between-subjects design, randomly assigned each participant to one model Topic model output on the left (6 topics + two honeypots) Select one of 16 different perspectives for each topic Step 3 we measured experience with the task
  8. Interesting that they were skewed; as we performed Prolific pre-screening
  9. Describe again why and what we did in this hypothesis test; we used ANOVA post hoc tests: TAM is the only one that is better than the TF-IDF baseline model
  10. Confirmation bias (ambiguous model output) spearman correlation – no evidence
  11. Normalized distribution over perspectives (x-axis) P1-p6 in the corpus, rest not Plot shows how often each perspective was selected Some perspectives were well represented in all models, like P5 (or people are familiar with them) TAM was good with perspectives that other models struggled with, such as P1 and P6 More exploratory results in the paper
  12. Other topics more sentiment-related words Future work: different domains, novel topic models
  13. Supplementary material is available on our repository Generating explanations to help people overcome biases
  14. Not in that order (see paper)
  15. Not in that order (see paper)
Advertisement