Clare llewellyn Lasiuk July 5th 2013
Upcoming SlideShare
Loading in...5

Clare llewellyn Lasiuk July 5th 2013



Using argument analysis to structure user generated content.

Using argument analysis to structure user generated content.



Total Views
Views on SlideShare
Embed Views



1 Embed 2 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Clare llewellyn Lasiuk July 5th 2013 Clare llewellyn Lasiuk July 5th 2013 Presentation Transcript

  • Clare Llewellyn University of Edinburgh Argumentation on the web - always vulgar and often convincing?
  • User Generated Content
  • Various Conversations
  • Various Conversations Main points of discussion:  RM is bad / old / Australian / has power over politicians / owns newspapers  RM does / doesn’t understand the internet  Free content is good / bad  The joke belongs to Tim Vine or Stuart Francis  Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack
  • The Problem Can we somehow structure this data so we can read it and add to it at the most relevant point?
  • Solutions?
  • Argumentation A participant makes a claim that represents their position The participant backs up that claim with evidence A counter claim challenges the position The composer of the original claim may evaluate their position.
  • Claim Counter Claim Evidence Counter Evidence Evaluation
  • Macro / Micro Argumentation Micro-level: Simple claim Qualified claim Grounded claim Grounded and qualified claim Non-argumentative moves Macro-level: Argument Counter argument Integration (reply) Non-argumentative moves Weinberger and Fischer (2006)
  • Methodology* * Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011) 1. Identify discussions on different topics 2. Identify spans of text that represent the core points in the discussion 3. Classify into a structure so as to define the relationships between spans of text 4. Present this information to users
  • Data Sets Hand annotated corpus of tweets from the London Riots (7729) Comments from the Guardian newspaper (partially hand annotated for topic) Tweets with the #OR2012 (5416)
  • • Extract individual discussion • Unsupervised clustering – very objective • Selection of algorithm Unigram / Bigram Frequency Incremental Clustering K-means Topic modelling Possible tools NLTK ( Weka ( Mallet ( Twitter Workbench ( 1. Topic Identification
  • Example Clusters Topic Modelling Incremental Clustering
  • Are you doing what a human would do? Results for comments data: Evaluation
  • 2. Text Span Identification Define a set of rules that allows the extraction of macro level argumentation Annotated text you can use machine learning Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim Claim Counter Claim
  • Rules production Method: Rules are a generalisation from a large amount of data (14000 quotes) Use Words / POS / Negation / Symbols Use the rules to find this patterns where not explicitly mentioned in text Examples: – Before: • @USERNAME: – After: • i don't • i think you • PRP VBP RB (Personal Pronoun, Verb singular present, Adverb) – Both • START X i 'm not Tools: LTT- TTT2
  • 3. Classify into a structure Method Based on Rose et al. (2008) Use supervised machine learning to classify tweets into an argument structure Using TagHelper tool kit (based on Weka) – – LightSide – Decide on a machine learning algorithm – Define feature sets – Train and test
  • Data Set Tweets Coded with the classification system: 1. Claim without evidence 2. Claim with evidence 3. Counter-claim without evidence 4. Counter-claim with evidence 5. Implicit request for verification 6. Explicit request for verification 7. Comment 8. Other
  • Classification – Feature Selection Features Unigrams + line length + POS Bigrams + bigrams + punctuation + stemming + no stemming + rare words + line length, punctuation and rare words + no stop list Algorithms Support Vector Machine Decision Tree Naive Bayes
  • QUESTIONS? Clare Llewellyn University of Edinburgh