MINING ARGUMENTS
FROM ONLINE DEBATING SYSTEMS
6TH MLDM.IT WORKSHOP @ AI*IA 2017
Andrea Pazienza, Stefano Ferilli
14th November 2017 – Bari, Italy
Overview
1. Introduction to Argumentation
2. Mining Argumentation Graphs from Online Debating
Systems
3. Application to a Reddit Thread
4. Conclusions and future works
INTRODUCTION TO ARGUMENTATION
Introduction to Argumentation
Argument Mining
Analyzing argumentation
structures in the discourse
Abstract
Argumentation
A framework for practical
and uncertain reasoning
able to cope with partial and
inconsistent knowledge
Argument Mining
Where are we?
Computational linguistics: statistical or rule-based modeling of natural
language
NLP: interactions between computers and human languages
Data mining: automatic extraction of data (often numerical)
Text mining: automatic extraction of data from natural language texts
Argument mining: automatic extraction of arguments from natural language
texts
Why to do it?
Big Data problem: the ever increasing amounts of data on the web mean that
manual analysis of this content seems to become increasingly infeasible.
How to process Big Data?
By mining information
Argument Mining
Existing approaches in NLP primarily focus on micro-level
(monological) rather than macro-level (dialogical) perspective
What can we mine?
Sentiment Anaysis: mining attitudes towards something (positive, neutral,
negative)
Opinion Mining: mining opinions about something
Graph-based dialogical model
(Macro-level) Argument Mining
Argument Mining
Segmenting texts into argumentative units
Identifying relations between units
Analyzing Polarity and classifying Stance
◦ Sentiment Analysis: e.g. number of
people liking new Mercedes vs. number
of people not liking new Mercedes
◦ Opinion Mining: e.g. people thinking
new Mercedes is too expensive, people
thinking new Mercedes is reliable
Mining arguments pro- and con- people’s
opinions
◦ e.g. not only the information that people
like Mercedes but also why: people like
new Mercedes, because they think it is
reliable
Abstract Argumentation
Argumentation Framework (AF)
encapsulates arguments as nodes in a digraph
connects them through a relationship of attack
defines a calculus of opposition for determining
what is acceptable
allows a range of different semantics
a
b
e
c
d
f
g
h
Generalizations of Argumentation Frameworks
Bipolar: add support relation
Weighted: add weights on attacks
Values, Preferences
etc.
Extension-based vs Ranking-based Semantics
extension-based semantics do not fully exploit the weight of relations
rank arguments from the most to the least acceptable ones
MINING ARGUMENTATION GRAPHS
FROM ONLINE DEBATING SYSTEMS
Online Debating Systems (ODS)
Classical Thread Discussion
A tree (i.e., hierarchical)
structure consisting of
Root node: the discussion
topic, i.e. the major claim,
shared by a user, followed by
Comments from other users
Each of these comments has
as children the comments in
response to it.
Step 1: Identification of arguments
Each content is an abstract argument
Problem Approach
Step 2: Identification of relations between arguments
Sentiment and Tone Analysis
◦ systematically identify, extract and quantify Polarity
◦ 5 Polarity classes: very negative, negative, neutral, positive, very positive
◦ StanfordCoreNLP provides the sentiment tool
Text Similarity
◦ check if two arguments are addressing the same topic
◦ Semantic Similarity via word embeddings
◦ GloVe algorithm for obtaining vector representations for words
Given α, β ∈ Atwo abstract arguments, representing two
consecutive comments of the online debate:
w( α, β ) similarity( α, β ) · sentiment(β) (1)
similarity: A× A → [0, 1]
sentiment: A → [−1, 1]
Bipolar Weighted Argumentation Framework
Bipolar Weighted Argumentation Framework (BWAF)
attack relations with a negative
weight in the interval [−1, 0[
support relations with a positive
weight in the interval ]0, 1]
a
b
0.7
e
-0.7
c0.9
d
-0.4
0.3
f
-0.5 g
-0.3
h
-0.5
-0.1
-0.7
BWAF Ranking-based Semantics by means of Strength Propagation
APPLICATION TO A REDDIT THREAD
Application to a Reddit Thread
We consider a Reddit discussion of an episode of Black Mirror, a
popular TV series
-0.49
-0.48
0.16
0.32
-0.1
-0.17
-0.1
-0.42
0.16
-0.47
0.16
-0.13
0.5
0.06
0.05
-0.5 0.5
0.45
-0.15
0.08
0.08
-0.5
-0.44
-0.5
-0.5
-0.5
-0.1
-0.08
-0.5
-0.48
-0.46
-0.13
-0.12
-0.12
-0.5
-0.16
-0.32
-0.33
-0.16
0.32
-0.16
0.17
-0.4
-0.5
-0.17
-0.25
0.25
0.24
0.22
-0.25
-0.16
-0.34
-0.48
-0.24
-0.16
-0.45
-0.16
-0.14
-0.48
0.43
-0.45
-0.46
-0.44
-0.47
-0.48
-0.48
-0.48
-0.19
-0.45
a17
a22
a80
a8
a75
a10
a24
a64
a26
a48
a47
a4
a29
a50
a68
a73
a65
a84
a74
a54
a78
a58
a66
a49
a41
a51
a55
a5
a40
a56
a62
a57
a61
a69
a60
a63
a59
a9
a71
a53
a45
a72
a27
a0
a85
a39
a38
a34a37
a36
a35
a28
a33
a43
a31
a42
a32
a13a44
a11
a16
a15
a14
a12
a18
a20
a76
a19
a21
a23
Sentiment Polarity + Text
Similarity to build the BWAF
Arguments acceptability via
sp-ranking Semantics
Construction procedure may
embed some noise but is simple
and computationally fast, so that
the argumentation model
instantiation will be still quite
reliable.
Application to a Reddit Thread
arg sp arg sp arg sp arg sp arg sp arg sp
a4 1.45 a80 1.0 a59 1.0 a38 1.0 a13 1.0 a32 0.8176
a43 1.43 a78 1.0 a57 1.0 a36 1.0 a8 1.0 a34 0.76
a22 1.32 a76 1.0 a55 1.0 a35 1.0 a5 1.0 a11 0.7077
a10 1.108 a73 1.0 a53 1.0 a33 1.0 a58 0.9641 a39 0.68
a19 1.0855 a72 1.0 a51 1.0 a29 1.0 a28 0.9595 a56 0.67
a50 1.08 a71 1.0 a49 1.0 a26 1.0 a18 0.9589 a37 0.66
a48 1.05 a69 1.0 a47 1.0 a24 1.0 a74 0.954 a75 0.58
a9 1.0425 a66 1.0 a45 1.0 a23 1.0 a60 0.9488 a68 0.56
a17 1.0197 a65 1.0 a44 1.0 a21 1.0 a0 0.9225 a20 0.55
a27 1.0035 a64 1.0 a42 1.0 a16 1.0 a31 0.9047 a12 0.54
a85 1.0 a63 1.0 a41 1.0 a15 1.0 a54 0.8492
a84 1.0 a62 1.0 a40 1.0 a14 1.0 a61 0.84
The (collective) strength propagation of all paths ending to nodes takes advantages of
weight of relations and, in particular, of weighted support relations
CONCLUSIONS AND FUTURE WORKS
Conclusions and Future Works
Argument Mining to extract arguments and relations between
them, to build a graph-based dialogical model
Considering the similarity between the comments, the sentiment
associated with them and their hierarchical structure, extract an
AF that models an online debate by identifying weighted attacks
and supports depending on their strength
To improve the quality of the argument graph construction,
further argument mining techniques may be exploited, even
though this may drastically impact on the computational cost.
Perspectives
Opinion mining:
understanding what people think about something VS understanding why
Going beyond critical thinking, i.e., a set of rational, deductive
arguments:
How to influence a real audience?
What is the role of emotions?
How to analyze human reasoning processes?
Big data:
social network posts, forums, blogs, product reviews, user comments to
newspapers articles, etc.
Deep learning:
fast and efficient machine learning algorithms large and unsupervised corpora
e.g., word embeddings: automatically learned feature spaces encoding
high-level, rich linguistic similarity between terms

Mining Arguments from Online Debating Systems

  • 1.
    MINING ARGUMENTS FROM ONLINEDEBATING SYSTEMS 6TH MLDM.IT WORKSHOP @ AI*IA 2017 Andrea Pazienza, Stefano Ferilli 14th November 2017 – Bari, Italy
  • 2.
    Overview 1. Introduction toArgumentation 2. Mining Argumentation Graphs from Online Debating Systems 3. Application to a Reddit Thread 4. Conclusions and future works
  • 3.
  • 4.
    Introduction to Argumentation ArgumentMining Analyzing argumentation structures in the discourse Abstract Argumentation A framework for practical and uncertain reasoning able to cope with partial and inconsistent knowledge
  • 5.
    Argument Mining Where arewe? Computational linguistics: statistical or rule-based modeling of natural language NLP: interactions between computers and human languages Data mining: automatic extraction of data (often numerical) Text mining: automatic extraction of data from natural language texts Argument mining: automatic extraction of arguments from natural language texts Why to do it? Big Data problem: the ever increasing amounts of data on the web mean that manual analysis of this content seems to become increasingly infeasible. How to process Big Data? By mining information
  • 6.
    Argument Mining Existing approachesin NLP primarily focus on micro-level (monological) rather than macro-level (dialogical) perspective What can we mine? Sentiment Anaysis: mining attitudes towards something (positive, neutral, negative) Opinion Mining: mining opinions about something Graph-based dialogical model
  • 7.
    (Macro-level) Argument Mining ArgumentMining Segmenting texts into argumentative units Identifying relations between units Analyzing Polarity and classifying Stance ◦ Sentiment Analysis: e.g. number of people liking new Mercedes vs. number of people not liking new Mercedes ◦ Opinion Mining: e.g. people thinking new Mercedes is too expensive, people thinking new Mercedes is reliable Mining arguments pro- and con- people’s opinions ◦ e.g. not only the information that people like Mercedes but also why: people like new Mercedes, because they think it is reliable
  • 8.
    Abstract Argumentation Argumentation Framework(AF) encapsulates arguments as nodes in a digraph connects them through a relationship of attack defines a calculus of opposition for determining what is acceptable allows a range of different semantics a b e c d f g h Generalizations of Argumentation Frameworks Bipolar: add support relation Weighted: add weights on attacks Values, Preferences etc. Extension-based vs Ranking-based Semantics extension-based semantics do not fully exploit the weight of relations rank arguments from the most to the least acceptable ones
  • 9.
    MINING ARGUMENTATION GRAPHS FROMONLINE DEBATING SYSTEMS
  • 10.
    Online Debating Systems(ODS) Classical Thread Discussion A tree (i.e., hierarchical) structure consisting of Root node: the discussion topic, i.e. the major claim, shared by a user, followed by Comments from other users Each of these comments has as children the comments in response to it. Step 1: Identification of arguments Each content is an abstract argument
  • 11.
    Problem Approach Step 2:Identification of relations between arguments Sentiment and Tone Analysis ◦ systematically identify, extract and quantify Polarity ◦ 5 Polarity classes: very negative, negative, neutral, positive, very positive ◦ StanfordCoreNLP provides the sentiment tool Text Similarity ◦ check if two arguments are addressing the same topic ◦ Semantic Similarity via word embeddings ◦ GloVe algorithm for obtaining vector representations for words Given α, β ∈ Atwo abstract arguments, representing two consecutive comments of the online debate: w( α, β ) similarity( α, β ) · sentiment(β) (1) similarity: A× A → [0, 1] sentiment: A → [−1, 1]
  • 12.
    Bipolar Weighted ArgumentationFramework Bipolar Weighted Argumentation Framework (BWAF) attack relations with a negative weight in the interval [−1, 0[ support relations with a positive weight in the interval ]0, 1] a b 0.7 e -0.7 c0.9 d -0.4 0.3 f -0.5 g -0.3 h -0.5 -0.1 -0.7 BWAF Ranking-based Semantics by means of Strength Propagation
  • 13.
    APPLICATION TO AREDDIT THREAD
  • 14.
    Application to aReddit Thread We consider a Reddit discussion of an episode of Black Mirror, a popular TV series -0.49 -0.48 0.16 0.32 -0.1 -0.17 -0.1 -0.42 0.16 -0.47 0.16 -0.13 0.5 0.06 0.05 -0.5 0.5 0.45 -0.15 0.08 0.08 -0.5 -0.44 -0.5 -0.5 -0.5 -0.1 -0.08 -0.5 -0.48 -0.46 -0.13 -0.12 -0.12 -0.5 -0.16 -0.32 -0.33 -0.16 0.32 -0.16 0.17 -0.4 -0.5 -0.17 -0.25 0.25 0.24 0.22 -0.25 -0.16 -0.34 -0.48 -0.24 -0.16 -0.45 -0.16 -0.14 -0.48 0.43 -0.45 -0.46 -0.44 -0.47 -0.48 -0.48 -0.48 -0.19 -0.45 a17 a22 a80 a8 a75 a10 a24 a64 a26 a48 a47 a4 a29 a50 a68 a73 a65 a84 a74 a54 a78 a58 a66 a49 a41 a51 a55 a5 a40 a56 a62 a57 a61 a69 a60 a63 a59 a9 a71 a53 a45 a72 a27 a0 a85 a39 a38 a34a37 a36 a35 a28 a33 a43 a31 a42 a32 a13a44 a11 a16 a15 a14 a12 a18 a20 a76 a19 a21 a23 Sentiment Polarity + Text Similarity to build the BWAF Arguments acceptability via sp-ranking Semantics Construction procedure may embed some noise but is simple and computationally fast, so that the argumentation model instantiation will be still quite reliable.
  • 15.
    Application to aReddit Thread arg sp arg sp arg sp arg sp arg sp arg sp a4 1.45 a80 1.0 a59 1.0 a38 1.0 a13 1.0 a32 0.8176 a43 1.43 a78 1.0 a57 1.0 a36 1.0 a8 1.0 a34 0.76 a22 1.32 a76 1.0 a55 1.0 a35 1.0 a5 1.0 a11 0.7077 a10 1.108 a73 1.0 a53 1.0 a33 1.0 a58 0.9641 a39 0.68 a19 1.0855 a72 1.0 a51 1.0 a29 1.0 a28 0.9595 a56 0.67 a50 1.08 a71 1.0 a49 1.0 a26 1.0 a18 0.9589 a37 0.66 a48 1.05 a69 1.0 a47 1.0 a24 1.0 a74 0.954 a75 0.58 a9 1.0425 a66 1.0 a45 1.0 a23 1.0 a60 0.9488 a68 0.56 a17 1.0197 a65 1.0 a44 1.0 a21 1.0 a0 0.9225 a20 0.55 a27 1.0035 a64 1.0 a42 1.0 a16 1.0 a31 0.9047 a12 0.54 a85 1.0 a63 1.0 a41 1.0 a15 1.0 a54 0.8492 a84 1.0 a62 1.0 a40 1.0 a14 1.0 a61 0.84 The (collective) strength propagation of all paths ending to nodes takes advantages of weight of relations and, in particular, of weighted support relations
  • 16.
  • 17.
    Conclusions and FutureWorks Argument Mining to extract arguments and relations between them, to build a graph-based dialogical model Considering the similarity between the comments, the sentiment associated with them and their hierarchical structure, extract an AF that models an online debate by identifying weighted attacks and supports depending on their strength To improve the quality of the argument graph construction, further argument mining techniques may be exploited, even though this may drastically impact on the computational cost.
  • 18.
    Perspectives Opinion mining: understanding whatpeople think about something VS understanding why Going beyond critical thinking, i.e., a set of rational, deductive arguments: How to influence a real audience? What is the role of emotions? How to analyze human reasoning processes? Big data: social network posts, forums, blogs, product reviews, user comments to newspapers articles, etc. Deep learning: fast and efficient machine learning algorithms large and unsupervised corpora e.g., word embeddings: automatically learned feature spaces encoding high-level, rich linguistic similarity between terms