SlideShare a Scribd company logo
Clare Llewellyn
University of Edinburgh
Argumentation on the web - always vulgar
and often convincing?
User Generated Content
Various Conversations
Various Conversations
Main points of discussion:

RM is bad / old / Australian / has power over politicians / owns newspapers

RM does / doesn’t understand the internet

Free content is good / bad

The joke belongs to Tim Vine or Stuart Francis

Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack
The Problem
Can we somehow structure this data so we can read it
and add to it at the most relevant point?
Solutions?
Argumentation
A participant makes a claim that represents their position
The participant backs up that claim with evidence
A counter claim challenges the position
The composer of the original claim may evaluate their position.
Claim
Counter Claim
Evidence
Counter Evidence
Evaluation
Macro / Micro Argumentation
Micro-level:
Simple claim
Qualified claim
Grounded claim
Grounded and qualified claim
Non-argumentative moves
Macro-level:
Argument
Counter argument
Integration (reply)
Non-argumentative moves
Weinberger and Fischer (2006)
Methodology*
* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)
1. Identify discussions on different topics
2. Identify spans of text that represent the core points in the discussion
3. Classify into a structure so as to define the relationships between spans of text
4. Present this information to users
Data Sets
Hand annotated corpus of tweets from the London Riots (7729)
www.analysingsocialmedia.org
Comments from the Guardian newspaper (partially hand annotated for topic)
Tweets with the #OR2012 (5416)
• Extract individual discussion
• Unsupervised clustering – very objective
• Selection of algorithm
Unigram / Bigram Frequency
Incremental Clustering
K-means
Topic modelling
Possible tools
NLTK (nltk.org)
Weka (www.cs.waikato.ac.nz/ml/weka/)
Mallet (mallet.cs.umass.edu)
Twitter Workbench (www.analysingsocialmedia.org/projects)
1. Topic Identification
Example Clusters
Topic Modelling Incremental Clustering
Are you doing what a human would do?
Results for comments data:
Evaluation
2. Text Span Identification
Define a set of rules that allows the extraction of macro level argumentation
Annotated text you can use machine learning
Non-annotated you can define rules – is there something specific in the
language that indicates claim / counter claim
Claim
Counter Claim
Rules production
Method:
Rules are a generalisation from a large amount of data (14000 quotes)
Use Words / POS / Negation / Symbols
Use the rules to find this patterns where not explicitly mentioned in text
Examples:
– Before:
• @USERNAME:
– After:
• i don't
• i think you
• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)
– Both
• START X i 'm not
Tools:
LTT- TTT2 www.ltg.ed.ac.uk/software/
3. Classify into a structure
Method
Based on Rose et al. (2008)
Use supervised machine learning to classify tweets into an argument structure
Using TagHelper tool kit (based on Weka)
– www.cs.cmu.edu/~cprose/TagHelper.html
– LightSide lightsidelabs.com
– Decide on a machine learning algorithm
– Define feature sets
– Train and test
Data Set Tweets
Coded with the classification system:
1. Claim without evidence
2. Claim with evidence
3. Counter-claim without evidence
4. Counter-claim with evidence
5. Implicit request for verification
6. Explicit request for verification
7. Comment
8. Other
Classification – Feature Selection
Features
Unigrams
+ line length
+ POS Bigrams
+ bigrams
+ punctuation
+ stemming
+ no stemming
+ rare words
+ line length, punctuation and rare words
+ no stop list
Algorithms
Support Vector Machine
Decision Tree
Naive Bayes
QUESTIONS?
Clare Llewellyn
University of Edinburgh
c.a.llewellyn@sms.ed.ac.uk

More Related Content

Similar to Clare llewellyn Lasiuk July 5th 2013

Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDanny
David Sugden
 
The Process of Qualitative Research Methods
The Process of Qualitative Research MethodsThe Process of Qualitative Research Methods
The Process of Qualitative Research Methods
evamaealvarado
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
Gaurav Marwaha
 
M-Assessment_D-NDave
M-Assessment_D-NDaveM-Assessment_D-NDave
M-Assessment_D-NDave
David Sugden
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
Ppt feb 7 2014 ss cc research skills
Ppt feb 7 2014 ss cc research skillsPpt feb 7 2014 ss cc research skills
Ppt feb 7 2014 ss cc research skills
primarysource
 
An informatics perspective on argumentation mining - SICSA 2014-07-09
An informatics perspective on argumentation mining - SICSA 2014-07-09An informatics perspective on argumentation mining - SICSA 2014-07-09
An informatics perspective on argumentation mining - SICSA 2014-07-09
jodischneider
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
Aseel Addawood
 
Dbms Cluster 4
Dbms Cluster 4Dbms Cluster 4
Dbms Cluster 4
out2sea5
 
Hypothesis quick overview 2011-10-19
Hypothesis  quick overview 2011-10-19Hypothesis  quick overview 2011-10-19
Hypothesis quick overview 2011-10-19
dwhly
 
First paragraph will Executive summary about our company 100 w.docx
First  paragraph will  Executive summary about our company 100 w.docxFirst  paragraph will  Executive summary about our company 100 w.docx
First paragraph will Executive summary about our company 100 w.docx
ernestc3
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
CITE
 
3rd Workshop on Social Information Retrieval for Technology-Enhanced Learnin...
3rd Workshop onSocial  Information Retrieval for Technology-Enhanced Learnin...3rd Workshop onSocial  Information Retrieval for Technology-Enhanced Learnin...
3rd Workshop on Social Information Retrieval for Technology-Enhanced Learnin...
Hendrik Drachsler
 
Sirtel Workshop
Sirtel WorkshopSirtel Workshop
Sirtel Workshop
MegaVjohnson
 
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docxWEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
cockekeshia
 
Foundations presentation siguccs management
Foundations presentation   siguccs managementFoundations presentation   siguccs management
Foundations presentation siguccs management
Beth Rugg
 
Coiro Online Inquiry Tool 2018
Coiro Online Inquiry Tool 2018Coiro Online Inquiry Tool 2018
Coiro Online Inquiry Tool 2018
Julie Coiro
 
E-Mail as Evidence
E-Mail as EvidenceE-Mail as Evidence
E-Mail as Evidence
Dan Michaluk
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the Future
IBM Watson
 

Similar to Clare llewellyn Lasiuk July 5th 2013 (20)

Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDanny
 
The Process of Qualitative Research Methods
The Process of Qualitative Research MethodsThe Process of Qualitative Research Methods
The Process of Qualitative Research Methods
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
M-Assessment_D-NDave
M-Assessment_D-NDaveM-Assessment_D-NDave
M-Assessment_D-NDave
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Ppt feb 7 2014 ss cc research skills
Ppt feb 7 2014 ss cc research skillsPpt feb 7 2014 ss cc research skills
Ppt feb 7 2014 ss cc research skills
 
An informatics perspective on argumentation mining - SICSA 2014-07-09
An informatics perspective on argumentation mining - SICSA 2014-07-09An informatics perspective on argumentation mining - SICSA 2014-07-09
An informatics perspective on argumentation mining - SICSA 2014-07-09
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
 
Dbms Cluster 4
Dbms Cluster 4Dbms Cluster 4
Dbms Cluster 4
 
Hypothesis quick overview 2011-10-19
Hypothesis  quick overview 2011-10-19Hypothesis  quick overview 2011-10-19
Hypothesis quick overview 2011-10-19
 
First paragraph will Executive summary about our company 100 w.docx
First  paragraph will  Executive summary about our company 100 w.docxFirst  paragraph will  Executive summary about our company 100 w.docx
First paragraph will Executive summary about our company 100 w.docx
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
 
3rd Workshop on Social Information Retrieval for Technology-Enhanced Learnin...
3rd Workshop onSocial  Information Retrieval for Technology-Enhanced Learnin...3rd Workshop onSocial  Information Retrieval for Technology-Enhanced Learnin...
3rd Workshop on Social Information Retrieval for Technology-Enhanced Learnin...
 
Sirtel Workshop
Sirtel WorkshopSirtel Workshop
Sirtel Workshop
 
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docxWEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx
 
Foundations presentation siguccs management
Foundations presentation   siguccs managementFoundations presentation   siguccs management
Foundations presentation siguccs management
 
Coiro Online Inquiry Tool 2018
Coiro Online Inquiry Tool 2018Coiro Online Inquiry Tool 2018
Coiro Online Inquiry Tool 2018
 
E-Mail as Evidence
E-Mail as EvidenceE-Mail as Evidence
E-Mail as Evidence
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the Future
 

Recently uploaded

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 

Recently uploaded (20)

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 

Clare llewellyn Lasiuk July 5th 2013

  • 1. Clare Llewellyn University of Edinburgh Argumentation on the web - always vulgar and often convincing?
  • 3.
  • 5. Various Conversations Main points of discussion:  RM is bad / old / Australian / has power over politicians / owns newspapers  RM does / doesn’t understand the internet  Free content is good / bad  The joke belongs to Tim Vine or Stuart Francis  Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack
  • 6. The Problem Can we somehow structure this data so we can read it and add to it at the most relevant point?
  • 8. Argumentation A participant makes a claim that represents their position The participant backs up that claim with evidence A counter claim challenges the position The composer of the original claim may evaluate their position.
  • 10. Macro / Micro Argumentation Micro-level: Simple claim Qualified claim Grounded claim Grounded and qualified claim Non-argumentative moves Macro-level: Argument Counter argument Integration (reply) Non-argumentative moves Weinberger and Fischer (2006)
  • 11. Methodology* * Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011) 1. Identify discussions on different topics 2. Identify spans of text that represent the core points in the discussion 3. Classify into a structure so as to define the relationships between spans of text 4. Present this information to users
  • 12. Data Sets Hand annotated corpus of tweets from the London Riots (7729) www.analysingsocialmedia.org Comments from the Guardian newspaper (partially hand annotated for topic) Tweets with the #OR2012 (5416)
  • 13. • Extract individual discussion • Unsupervised clustering – very objective • Selection of algorithm Unigram / Bigram Frequency Incremental Clustering K-means Topic modelling Possible tools NLTK (nltk.org) Weka (www.cs.waikato.ac.nz/ml/weka/) Mallet (mallet.cs.umass.edu) Twitter Workbench (www.analysingsocialmedia.org/projects) 1. Topic Identification
  • 14. Example Clusters Topic Modelling Incremental Clustering
  • 15. Are you doing what a human would do? Results for comments data: Evaluation
  • 16. 2. Text Span Identification Define a set of rules that allows the extraction of macro level argumentation Annotated text you can use machine learning Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim Claim Counter Claim
  • 17. Rules production Method: Rules are a generalisation from a large amount of data (14000 quotes) Use Words / POS / Negation / Symbols Use the rules to find this patterns where not explicitly mentioned in text Examples: – Before: • @USERNAME: – After: • i don't • i think you • PRP VBP RB (Personal Pronoun, Verb singular present, Adverb) – Both • START X i 'm not Tools: LTT- TTT2 www.ltg.ed.ac.uk/software/
  • 18. 3. Classify into a structure Method Based on Rose et al. (2008) Use supervised machine learning to classify tweets into an argument structure Using TagHelper tool kit (based on Weka) – www.cs.cmu.edu/~cprose/TagHelper.html – LightSide lightsidelabs.com – Decide on a machine learning algorithm – Define feature sets – Train and test
  • 19. Data Set Tweets Coded with the classification system: 1. Claim without evidence 2. Claim with evidence 3. Counter-claim without evidence 4. Counter-claim with evidence 5. Implicit request for verification 6. Explicit request for verification 7. Comment 8. Other
  • 20. Classification – Feature Selection Features Unigrams + line length + POS Bigrams + bigrams + punctuation + stemming + no stemming + rare words + line length, punctuation and rare words + no stop list Algorithms Support Vector Machine Decision Tree Naive Bayes
  • 21. QUESTIONS? Clare Llewellyn University of Edinburgh c.a.llewellyn@sms.ed.ac.uk