Modeling Causal Reasoning in Complex Networks through NLP: an Introduction

Modeling Causal Reasoning in Complex
Networks through NLP: an Introduction
How successful causal communication works?
How context and agents inﬂuence the relevance of causal features?
What about causal disagreements:
Linguistic ambiguity? Cognitive failure? Networks’ constraints?
And foremost, how to quantify all the variables interplaying?
Luca Nannini 11th October 2019
Cog Sem - Ling AU Symposium
1

2
A little about me

Main questions:
How are causal representations reﬁned and updated collectively in
communication?
How do causal disagreements arise, and how do conversational partners
interact to align their interpretations of causal events?
3

4
Descriptive Analysis: LIWC + NLP
Predictive Analysis: Causal Inference
Network Interferences: Information Diffusion & Contagion

5 5
MA Thesis Research Project: Modeling mass entrainment as engagement and
semantic contagion in the 2016 U.S. ﬁrst presidential debate live-tweeting

6 6
MA Thesis Research Project: Mass attention as tweet volume, salient
moments of the debate

7 7
MA Thesis Research Project: What is NLP?
AWFUL CODES,
BEAUTIFUL
STORIES

Natural Language Processing (NLP) in Data Science is about analyzing huge amounts of
text data for computationally elaborate insights on how human language is used - both on
lexical and semantic level, synchronically or diachronically.
NLP tasks rely on machine learning models that can be trained and tested with supervised
learning (e.g. classiﬁcation and regression problems) or with unsupervised learning (e.g.
clustering and highlighting patterns)
8 8

Text information can be statistical: words count, words frequency, sentence length are
few of the lexical operations for quantifying lexicon and its usage.
This information can be syntactic too, such as chunking sentences and tagging
part-of-speech (POS tagging).
On a more advanced level, text information can be semantic: text classiﬁcation in NLP is
about identifying the topics in a text - information retrieval, ranking documents, detecting if
an email it’s a spam or not, identifying if a review is positive or not (Sentiment Analysis),
correcting the spelling of a term or suggesting different verb tenses or nouns in a sentence
9 9

Topic Modeling is that subﬁeld of NLP that deals with ﬁnding semantic
clusters (topics) and tendencies of words associations in text corpora.
The main models used are:
- Latent Semantic Analysis (LSA)
- Latent Dirichlet Allocation (LDA)
1010
MA Thesis Research Project: What is NLP? What is topic modeling?

Latent Dirichlet Allocation: a generative probabilistic model discovering
and classifying topics tendencies in text documents.
“Documents are represented as random mixtures over latent topics,
where each topic is characterized by a distribution over words”
(Blei, Ng, & Jordan, 2003).
“Don’t worry about it if you don’t understand”
Andrew Ng allows us to be dumb, no prob
MA Thesis Research Project: What is NLP? What is topic modeling? LDA?
1111

Each w in each d comes from a t and this t
is selected from a per-document distribution
over T. So we have two matrices:
1. ϴtd = P(t|d) which is the probability
distribution of topics in documents
2. Фwt = P(w|t) which is the probability
distribution of words in topics
Allocation: given
Dirichlet, allocate t to
d and w of d to t
Latent: don’t know a
priori - hidden in data.
MA Thesis Research Project: What is NLP? What is topic modeling? LDA?
1212
Dirichlet: distribution
of distributions. lol
Distribution of T in D
Distribution of W in T

- Text Corpora: collection of n documents
- Document: collection of n given topics distributed in a certain proportion
- Given a putative n Topics, the model segregates the keywords (w) distribution
along with the topics’ one
- Words are arranged according to un-/known parameters: e.g. the n topics given,
the variety of topics treated in the texts, the algorithm tuning parameters
1313

Distributional Hypothesis -
J. R. Firth, 1957: linguistics-based
hypothesis stating that words
co-occurring in the same lexical
contexts tend to be more
distributionally similar their
semantic meaning
Word Embeddings - Classifying words’ co-occurrences
MA Thesis Research Project: What is NLP? What is topic modeling? LDA? FastText?
1717

Word index sequences are read during
the training phase as embedding
vectors containing dense vectors of
multidimensional matrix values.
These dense vectors allocate the words’
location in the continuous vector space.
This continuous vector space is a
lower-dimensional space that preserves
semantic relationship encoding
embeddings’ position as distance and
vector direction.
Word Embeddings - Classifying words’ co-occurrences
Bag-of-Words (BOW)
Tokenization
(Normalization, stemming/lemmatization)
↓
Vectorization
(Assign numerical values through feature
selection)
=
Word Embeddings
Vector representation of tokens in a continuous
multidimensional vector space
1818

FastText: an extension of Word2Vec’s architecture released by Facebook AI Research in 2016 (Joulin, Grave,
Bojanowski, & Mikolov, 2016). FastText has also an open-source library working for text representations and text
classifiers with pre-trained word vector models available in several natural languages.
The main difference with Word2Vec is that FastText allows for representing the word occurrence chunking it in
several n-grams: the target word is replaced by a label. It returns rare words overcoming their morphological
inflection or other lexical derivations (prefix or suffix).
FastText aims to predict a category rather than predict a word due to an architecture of single layers based on
CBOW model for word representation. Further, this architecture is provided with a hierarchical softmax and not a
softmax over labels as Word2Vec - for a faster training phase
19 1
9
1919

20 2
0
MA Thesis Research Project:
FastText word embedding of tweets
2020

2121

22 2
2
1. On the debate event (e.g. ‘tonight’, ‘presidential’,
‘debatenight’, ‘trump’, ’clinton’, ’show’).
2. ‘Social healing’ topic area with terms regarding racial
relations, police and marginal communities (‘race’, ‘police’,
‘plan’, ‘community’, ‘order’).
3.‘Achieving prosperity’ with words on tax policy, job
creation, economic deals and business investments (e.g.
‘job’, ‘tax’,’money’, ‘business’, ‘federal’, ‘pay’, ‘trillion’)
4. Clinton’s terms are reported, produced during the salient
moments of the two initial topic segments (‘hillary’, ‘hrc’,
‘email’, ‘fact-check’)
5. Live commentary of candidates image, with foremost bad
language (‘dumb’, ‘idiot’, ’crazy’, ’joke’, ’fuck’).
6. Live commentary of the debate per se (e.g. ‘interrupt’,
‘moderator’, ‘mention’, ‘speak’,’started’)plus references to
drinking games (e.g. “drink a shot every time someone
says…”).
7. Most common verbs used.
2222

23 2
3
Surely - it sounds obvious that people tweet for fun, for
providing living commentary of candidate persona, for assess
their political leaning and discredit the opponents.
But live-tweeting is inﬂuenced by several variables
How do you account for...
- Social Cognition/Behavioral components of leadership
- Network Structures
- Time scale of engagement
- Group polarization (opinion leadership), selective exposure
- News diet, social cohesion
Surely - it sounds obvious that people tweet for fun, to provide
an informal living commentary of candidates’ persona, to assess
their political leaning and discredit the opponents.
But live-tweeting is inﬂuenced by several variables
2323

24 2
4
There is a pilot group (A) that creates information and a target
group (B) that receive and tailor it according to several
endogenous and exogenous variables interplaying.
How to quantify them?
Linguistic, Cognitive, Network variables
Can we forecast how A rhetorical patterns will impact B?
B is composed of heterogeneous subgroups with different
exposure: How to detect them? What inﬂuences them?
Long story short - Limitations of my study
2424

26 2
6
My current project:
OLaV
Data
Mining
↓
Preprocessing
↓
Wrangling
↓
Visualization
1.
Modeling Causal
Inference Computationally
● Social Media Mining
● Topic Modeling (NLP)
● LIWC Causal Analysis
● Train & Test Classiﬁer
● Implementation for SCM
(Structural Causal Models)
2626

27 2
7
RQ1: What linguistic, discursive, and interactional
patterns characterize pro- and anti-vaccine posts on
social media?
RQ2: How do anti-vaccine proponents construct
alternative causal explanations for recent
vaccine-related events like global measles outbreaks?
RQ3: How does the particular packaging of causal
information about vaccine-preventable outbreaks affect
subjects’ interpretation of the information?
OLaV “Online Language of Vaccines: A mixed-methods
cross-cultural study of the vaccination debate on social media”
AU LICS Department - Alexandra Regina Kratschmer, Rebekah
Brita Baglini, Byurakn Ishkhanyan, Ana Paulla Braga Mattos
Check us out on:
- Twitter, @OLaV_AU
- GitHub, olav-au.github.io/project/
2727

28 2
8
Our current approach:
LIWC + NLP = detecting vaccine stances on tweets
Take 10% of the vaccine tweets datasets with highest LIWC
causation values →
Pipeline: train a classiﬁer for detecting causal stances, i.e.
assessing polarity (pro- / anti-) through the association of lexical
causatives and other lexicon →
Integrate a Structural Causal Model for retrieving causal dynamics
Linguistic Inquiry and Word Count (LIWC)
2828

29 2
9
Linguistic Inquiry and Word Count (LIWC)
2929

30 3
0
BUT modeling causal reasoning is not descriptive:
It’s about retrieving and quantifying inferential processes →
i.e. modeling the causes that contributed to output the actual effect →
i.e. the linguistic, cognitive and network features that contributed in shaping a given
linguistic and/or rhetorical pattern
NLP methods are foremost descriptive:
1. Scraping text data online
2. Preprocessing them (a delicate task)
3. Analyzing with already attuned models (foremost)
4. Present them
OLaV “Online Language of Vaccines: A mixed-methods
cross-cultural study of the vaccination debate on social media”
AU LICS Department - Alexandra Regina Kratschmer, Rebekah
Brita Baglini, Byurakn Ishkhanyan, Ana Paulla Braga Mattos
3030

31 3
1
The Fundamental Problem of Causal
Inference, Rubin 1988
3131

32 3
2
BUT modeling causal reasoning is not descriptive:
It’s about retrieving and quantifying inferential processes →
i.e. modeling the causes that contributed to output the actual effect →
i.e. the linguistic, cognitive and network features that contributed in shaping a given
linguistic and/or rhetorical pattern
The Fundamental Problem of Causal
Inference, Rubin 1988
What are the treatments causal effect on a particular
individual as measured by an outcome?
Problem: we are not able to see the counterfactuals
from a single outcome - we have to advance inferences
3232

34 3
4
Ladder of Causation,
Pearl 2018
I. Association can have no
causal implications
II. Intervention is assessing
causality by experimentally
performing some action
that affects one of the
observed events
III. Counterfactual level is
about inferring alternate
causal version of a past
event
3434

35 3
5
ElectricityFire
Smoke
-
CO2
Alarm
signal
-
Loud
Beeping
Irritation
-
Headache
Call
Fireﬁghters
- Extinguish
it
How to solve the
problem?
IF the problem is
Turn It Off
-
Burn your
soul in hell
Side effects
3535

36 3
6
Few questions:
How do we interpret the
prior causes; how we give
weights to them and their
collateral effects, how we use
and negotiate these causal
explanations?
36
ElectricityFire
Smoke
-
CO2
Alarm
signal
-
Loud
Beeping
Irritation
-
Headache
Call
Fireﬁghters
- Extinguish
it
How to solve the
problem?
IF the problem is
Turn It Off
-
Burn your
soul in hell
Side effects
3636

37 3
7
“A causal structure entails a probability model, but it
contains additional information not contained in the
latter. Causal reasoning [...] denotes the process of
drawing conclusions from a causal model, similar to
the way probability theory allows us to reason about
the outcomes of random experiments. However, since
causal models contain more information than
probabilistic ones do, causal reasoning is more
powerful than probabilistic reasoning, because causal
reasoning allows us to analyze the effect of
interventions or distribution changes.”
3737

3
8
A causal graph is typically
represented as a Directed
Acyclic Graph (DAG), where
the directed edges represent
the direction of causal
inﬂuences between variables,
which are represented as
vertices.
3838
More commonly, however, the true data-generating process
is more likely to correspond to a directed acyclic graph (DAG)
model. DAGs do not share the limitations of chain graphs and
have been used for decades to guide inference and modeling,
especially for causal inference (Pearl, 2000).
A sequence of non-repeating vertices (V1, . . . , Vk) is called a path if
for every i = 1, . . . , k − 1, Vi and Vi+1 are connected by an edge.
A path is partially directed if there exists an ordering of the vertices
such that all directed edges in the path point towards the vertex
with a larger index.
A partially directed path is directed if it contains no undirected
edges.
A mixed graph is contains a partially directed cycle if it contains a
partially directed path with a directed edge from the last to the ﬁrst
node in the path.
A mixed graph with no partially directed cycles is called a chain
graph (CG). A chain graph without undirected edges is called a
directed acyclic graph (DAG), and a chain graph without directed
edges is an undirected graph (UG).

But, before it, let’s choose a keyword for some live data mining
39 3
9
Break?
3939

What caused A to agree/disagree with B? Can we build a model to forecast
and retrieve rhetorical behaviors, causal reasoning, and alignments?
40
Lexical Analysis:
● Linguistic Inquiry Word
Count [LIWC]
● NLTK: Words Count &
Frequency
Semantic Analysis:
● Comparison between text
corpora:
○ Softcossim
○ KL divergence
● Topic Modeling:
○ Latent Semantic
Analysis
○ Latent Dirichlet
Allocation
● Sentiment Analysis
● Word Embeddings:
○ Word2Vec
○ GloVe
○ FastText
● Sentence Embeddings:
○ FastText
○ Doc2Vec
○ Sent2Vec
4
0
Causal Reasoning:
● Structural Equation Models
● Chain Graphs
○ Direct Acyclic Graphs
(DAGs)
Natural Language Understanding:
● CommonSense Inference
(semantic entailment):
○ Event2Mind
○ A TOMIC
○ SWAG
● Reading Comprehension,
Sentence Prediction:
○ Google’s BERT
○ OpenAI’s GPT-2
○ ELMo
4040

41 4
1
Natural Language Understanding (NLU) in NLP is about creating models (e.g. chat-bots)
that, having analyzed huge amounts of text data, may be capable to understand the
semantics of natural language for predicting our linguistic (and semantic) habits.
CommonSense Inference (semantic entailment):
○ Event2Mind
○ A TOMIC
○ SWAG
Reading Comprehension, Sentence Prediction:
○ Google’s BERT
○ ELMo
What is NLU? Which models could be integrate in a ML Pipeline for advancing causal inferences?
Natural Language Inference (NLI) in NLP is
the task of determining whether a
“hypothesis” is true (entailment), false
(contradiction), or undetermined (neutral)
given a “premise”
4141

42 4
2
ConceptNet
A semantic map for AI
4242

43 4
3
CommonSense models:
ATOMIC - Commonsense
reasoning IF - THEN
4343

44 4
4
CommonSense models:
ATOMIC - Commonsense
reasoning IF - THEN
4444

45 4
5
CommonSense models:
Event2Mind
4545

46 4
6
SWAG: Situations With
Adversarial Generations
4646

47 4
7
Reading Comprehension,
Sentence Prediction models:
OpenAI’s GPT-2
4747
Reading Comprehension,
Sentence Prediction models:
Google’s BERT

Modeling Online Interaction:
Endogenous factors
● Qualitative Online Discourse
Analysis
● Detect Linguistic & Dialogical
Features
○ Lexical choice
○ Arguments choice
○ Information Contagion
(e.g. URLs, retweets,
mentions)
● Causal disagreement:
○ Linguistic?
○ Cognitive?
49 4
9
It’s about meaning
production per se and
meaning in context
Linguistics
↓
Semantics
↓
Pragmatics
Grammar
↓
Denotation/Connotation
↓
Speech Acts, Context
constraints, etc.
2.
4949

50 5
0
G. Frege - Sense &
Reference, 1892It’s not about language per se, BUT it’s about
how we use language in context:
What’s the reference? What’s the intention?
Reference
(extension,
denotation)
↓
What the
expression
refers to
Sense
(intension,
connotation)
↓
Meaning of the
expression
P.s. Think about Peirce,
Barthes & Eco’s concept of
semiosis.
Think about Pragmatics
5050

L. Wittgenstein -
Philosophical
Investigation, 1953
5
1
51 5
1
Meaning is Use: utterances are only
explicable in relation to the activities in
which they play a role; the meaning of a
word is revealed in its use.
He called these activities ‘language-games’.
The rules are learned and made manifest
by actually playing the game.
E. Berne - Games
People Play, 1964
Transactional Analysis: meaning is not set
in stone - does not rely on a prescriptive
level (linguistic or semantic) - but it is
negotiated and constrained by
psychological roles and implicatures that
we consciously and unconsciously embrace
5151

52 5
2
J. Searle - Speech Acts,
1969Not all pseudo-statements are intended
(or only intend in part) to record or impart
straightforward information about some
facts. They are intended to be something
quite different, such as “performative verbs”
e.g. I declare, I christen this, I object, I
sentence, etc.
Illocutionary Act
↓
Act has force in saying
something
Locutionary Act
↓
Act has meaning
Perlocutionary Act
↓
Act as effects
achieving
5252

53 5
3
P. Grice - Maxims,
1975
● Quantity: In answer to "Tell me about him!":
He has a nice personality. [≠ informative]
● Quality: In response to something stupid someone did:
That was brilliant! [≠ true]
● Relation: In response to "Can I go out and play?":
Did you ﬁnish your homework? [≠ pertinent]
● Manner: A wedding ring should be tight, after all, it's purpose is
to limit your circulation. [≠ unambiguous]
How do we assess sarcasm,
irony and other weird
psychopathic manipulations ?
5353

54 5
4
That’s pragmatics, folks #1
5454

55
That’s pragmatics, folks #2
5555

Modeling Causal
Inference Computationally
Endogenous factors
● Social Media Mining
● Topic Modeling (NLP)
● LIWC Causal Analysis
● Train & Test Classifier
● Implementation for SCM
(Structural Causal Models)
● Qualitative Online Discourse
Analysis
● Detect Linguistic & Dialogical
Features
○ Lexical choice
○ Arguments choice
○ Information Contagion
(e.g. URLs, retweets,
mentions)
● Causal disagreement:
○ Linguistic?
○ Cognitive?
● Social Networks structure, ties,
engagement, news sources
and availability
● Benchmark findings of
Linguistic & Dialogical
Features
● Integration, optimization &
validation of the classifier
Exogenous factors
56 5
6
3.
5656

57 5
7
Visualizing Twitter’s networks
Hoaxy is an open platform developed at
Indiana University to track the spread of
claims and fact checking.
A search engine, interactive visualizations,
and open-source software are freely available
(hoaxy.iuni.iu.edu). The data are accessible
through a public application programme
interface (API).
Enter a keyword, search Twitter content (from the last week) or
Hoaxy, i.e. articles from misinformation and fact-checking source.
You can even select up to 20 related articles and generate a
timeline with a network graph
5757

58 5
8
Network Interference:
● Network Structure
○ Social Network Ties
○ Algorithmic popularity bias
● Engagement
○ Information Overload
○ Responsiveness
○ Interests
● News Diet
○ News sources
○ News agenda
○ Low-credibility content
● Network Exposure
○ Filter Bubbles
○ Echo-Chambers
● Behavioral patterns
○ Selective Exposure
(Homophily)
○ Epistemic Authority
Back to Twitter and Complex Networks
5858

Back to Twitter and Complex Networks: Information Reception
59 5
9
Selective
Exposure is
influenced
by online
behaviour
Recommendation
Systems (search
engines, previous
chronology online)
Homophily
(tendency to group
according to
interests and
commonalities)
Algorithmic bias
Confirmation bias
Filter
Bubbles
Info sources are
constrained
Confirmation bias
consolidated
Info patterns strongly
repeated
Echo -
Chambers
5959

60 6
0
Active shift:
Group Polarization
Echo -
Chambers
Group polarization (C. R. Sunstein, 2002), on a basic level, is that social
tendency of a
“predictable shift within a group discussing a case or a problem. As
the shift occurs, groups, and group members move and coalesce, not
toward the middle of antecedent dispositions, but toward a more
extreme position in the direction indicated by those dispositions.
The effect of deliberation is both to decrease variance among group
members, as individual differences diminish, and also to produce
convergence on a relatively more extreme point among pre-deliberation
judgments”
6060

61 6
1
Active shift:
Group Polarization
Partisan polarization
is common in political
groups
It can boost political
discussions and
engagements
Cross-ideological
exposure mitigate
echo-chambers
On the evidence of cross-ideological political
discourse, Garrett (2009) points out that even if
selective exposure occurs for individuals and online
news, “people do not seek to completely exclude other
perspectives from their political universe, and there is
little evidence that they will use the Internet to create
echo chambers, devoid of other viewpoints, no matter
how much control over their political informative
environment they are given.
To the contrary, the longer read times associated
with opinion-challenging information suggest that
people may wish to maintain awareness of diverse
political views (while ensuring that their own beliefs
are well supported)”.
6161

62 6
2
Cross-ideological
exposure
Psycholinguistic
factors interplaying
Socialization
compromises
polarizations
Media environment
is high-choice, i.e.
heterogeneous
congregate of
information sources
6262

63 6
3
Cross-ideological
exposure
Given these variables, how we model causal inference?
Selective
Exposure is
inﬂuenced
by online
behaviour
Filter
Bubbles
Echo -
Chambers
Active shift:
Group Polarization
Chain Graphs ?
6363

Network-oriented
modelling based on
temporal-causal
networks (?)
64 6
4
6464
The Network-Oriented Modelling
approach based on temporal–causal
networks is a generic and declarative
dynamic modelling approach based on
networks of causal relations. Dynamics
is addressed by incorporating a
continuous time dimension.
This temporal dimension enables
modelling by networks that inherently
contain cycles, such as networks
modelling mental or brain processes, or
social interaction processes, and also
enables to address the timing of the
processes in a differentiated manner.

65
Lexical Analysis:
● Linguistic Inquiry Word
Count [LIWC]
● NLTK: Words Count &
Frequency
Semantic Analysis:
● Comparison between text
corpora:
○ Softcossim
○ KL divergence
● Topic Modeling:
○ Latent Semantic
Analysis
○ Latent Dirichlet
Allocation
● Sentiment Analysis
● Word Embeddings:
○ Word2Vec
○ GloVe
○ FastText
● Sentence Embeddings:
○ FastText
○ Doc2Vec
○ Sent2Vec
Network Interferences: Information Diffusion & Contagion
Causal Reasoning:
● Structural Equation Models
● Chain Graphs
○ Direct Acyclic Graphs
(DAGs)
Natural Language Understanding:
● CommonSense Inference
(semantic entailment):
○ Event2Mind
○ A TOMIC
○ SWAG
● Reading Comprehension,
Sentence Prediction:
○ Google’s BERT
○ ELMo
Pragmatic Distortion:
○ Linguistic Ambiguity
(e.g. lexical
constraints)
○ Semantic Ambiguity
(e.g. speech acts,
sense and reference,
Implicatures, etc.))
Network Interference:
● Network Structure
○ Social Network Ties
● Engagement
○ Attention
○ Responsiveness
○ Interests
● News Diet
○ News sources
○ News agenda
● Network Exposure
○ Filter Bubbles
○ Echo-Chambers
● Behavioral patterns
○ Selective Exposure
(Homophily)
○ Epistemic Authority
What caused A to agree/disagree with B? Can we build a model to forecast
and retrieve rhetorical behaviors, causal reasoning, and alignments?
What caused A to agree/disagree with B? Can we build a model to forecast and retrieve rhetorical behaviors, causal
reasoning, and alignments quantifying all the network interferences that do shape information diffusion and contagion?
6
5
6565

Information Diffusion Information Contagion
News
Qual-Quant
Filter
Bubbles
Engagement
Selective
Exposure
Algorithmic
Bias
Network
Ties
Endogenous & Exogenous variables: a sketch
66 6
6
Network
Structure
Agents’
Interplay
Sources’
Interplay
Complex Networks - Information Studies Cognitive Science
Semiotics - Pragmatics
Chain Graphs/SCMs, NLP/NLU NLP
Semantic
Tendencies
Linguistic
Tendencies
Dialogical
Interplay
Information Environment Information Reception Information TradingInformation Flow
6666

Information Diffusion Information Contagion
Future Directions
A pipeline composed by NLU commonsense models, DAGs and other mixed chain
graphs? How to deal with different timescales in a dynamic framework?
How to harness endogenous and exogenous variables?
Endogenous & Exogenous variables: a sketch
67 6
7
Information Environment Information Reception Information TradingInformation Flow
6767

68 6
8
6868
After this Symposium

69 6
9
Data Scraping
through the API:
GetOldTweets3
Bonus part: Let’s play around
Data
Preprocessing
Text stripping and
normalization
Data Wrangling
LDA +
FastText
Data
Visualization
NetworkX - users’
interactions
Disclaimer:
I hope that my CPU, Conda,
and Python frameworks will
allow me to do that
Choose a topic and some keywords!
6969

70 7
0
● Peters, J., Janzing, D. and Schölkopf, B., 2017. Elements of causal inference: foundations and learning algorithms. MIT press.
● Christiansen, M.H. and Chater, N., 2016. Creating language: Integrating evolution, acquisition, and processing. MIT Press.
● Hume, D., 2012. A treatise of human nature (1739). Courier Corporation.
● Pearl, J., 2000. Causality: models, reasoning and inference (Vol. 29). Cambridge: MIT press.
● Goodman, N.D., Ullman, T.D. and Tenenbaum, J.B., 2011. Learning a theory of causality. Psychological Review, 118(1), p.110.
● Tylén, K., Weed, E., Wallentin, M., Roepstorff, A. and Frith, C.D., 2010. Language as a tool for interacting minds. Mind & Language, 25(1),
pp.3-29.
● Fusaroli, R., Bahrami, B., Olsen, K., Roepstorff, A., Rees, G., Frith, C. and Tylén, K., 2012. Coming to terms: quantifying the beneﬁts of
linguistic coordination. Psychological science, 23(8), pp.931-939.
● Wang, Z. and Culotta, A., 2019, July. When Do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception Using
Individual Treatment Effect Estimation. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence (Vol. 33, pp. 7233-7240).
● Grice, H.P., 1975. Logic and conversation. In P. Cole & J. Morgan (eds.), Syntax and Semantics, Vol. 3, 41-58. New York: Academic Press.
● Grice, H. P., 1981. Presupposition and conversational implicature. In P. Cole (ed.), Radical Pragmatics, 183–198. New York: Academic Press.
● Wilson, D., and Sperber, D., 2002. Relevance theory.
● Tylén, K., Fusaroli, R., Bundgaard, P.F. and Østergaard, S., 2013. Making sense together: A dynamical account of linguistic meaning-making.
Semiotica, 2013(194), pp.39-62.
● LaPolla, R.J., 2015. On the logical necessity of a cultural connection for all aspects of linguistic structure. In Rik De Busser & Randy J.
LaPolla (eds.), Language Structure and Environment: Social, Cultural, and Natural Factors, 33-44. Amsterdam & Philadelphia: John
Benjamins.
● Hopper, P., 2012. Emergent grammar. In James Gee & Michael Handford (eds.), The Routledge handbook of discourse analysis, 301-314.
London & New York: Routledge.
● Baglini, R. Direct causation: A new approach to an old question. PLC U. Penn Working Papers in Linguistics. Submitted;26.
References Consulted
7070

71 7
1
● Frege, G., 1892. On sense and meaning. Translations from the philosophical writings of Gottlob Frege, 3, pp.56-78.
● Lenci, A., 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20(1), pp.1-31.
● Erk, K., 2016. What do you know about an alligator when you know the company it keeps?. Semantics and Pragmatics, 9, pp.17-1.
● Nannini, L., 2019. Analyzing semantic contagion of mass entrainment in tweets produced during 2016 U.S. first presidential debate. [online]
Google Docs. Available at: https://docs.google.com/document/d/15iUWQeGP_y3h0zupZ1xxdMPN66eLau4xWgRS13lCQIc/edit?usp=sharing
● Pennebaker, J.W., Boyd, R.L., Jordan, K. and Blackburn, K., 2015. The development and psychometric properties of LIWC2015.
● Faasse, K., Chatman, C.J. and Martin, L.R., 2016. A comparison of language use in pro-and anti-vaccination comments in response to a high
proﬁle Facebook post. Vaccine, 34(47), pp.5808-5814.
● Mitra, T., Counts, S. and Pennebaker, J.W., 2016, March. Understanding anti-vaccination attitudes in social media. In Tenth International AAAI
Conference on Web and Social Media.
● Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T., 2017. Enriching word vectors with subword information. Transactions of the Association for
Computational Linguistics, 5, pp.135-146.
● Darling, W.M., 2011, December. A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the
49th annual meeting of the association for computational linguistics: Human language technologies (pp. 642-647).
● Ramage, D., Dumais, S. and Liebling, D., 2010, May. Characterizing microblogs with topic models. In Fourth international AAAI conference on
weblogs and social media.
● Ritter, A., Cherry, C. and Dolan, B., 2010, June. Unsupervised modeling of twitter conversations. In Human Language Technologies: The 2010
Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 172-180). Association for Computational
Linguistics.
● Coppersmith, G., Dredze, M. and Harman, C., 2014, June. Quantifying mental health signals in Twitter. In Proceedings of the workshop on
computational linguistics and clinical psychology: From linguistic signal to clinical reality (pp. 51-60).
● Bowman, S.R., Angeli, G., Potts, C. and Manning, C.D., 2015. A large annotated corpus for learning natural language inference. arXiv preprint
arXiv:1508.05326.
● Lopez-Paz, D., Muandet, K., Schölkopf, B. and Tolstikhin, I., 2015, June. Towards a learning theory of cause-effect inference. In International
Conference on Machine Learning (pp. 1452-1461).
● Ogburn, E.L., Shpitser, I., and Lee, Y., 2018. Causal inference, social networks, and chain graphs. arXiv preprint arXiv:1812.04990.
7171

72 7
2
● Bhattacharya, R., Malinsky, D. and Shpitser, I., 2019. Causal Inference Under Interference And Network Uncertainty. arXiv preprint arXiv:1907.00221.
● Gray, V. 2019.. How a 16-year-old got us to care about climate change. [online] Pulsar Platform. Available at:
https://www.pulsarplatform.com/blog/2019/how-a-16-year-old-got-us-to-care-about-climate-change/?fbclid=IwAR2AMbSIzPuFD5_6mqSxMPboNk
7bWRu_8YLRLsdemBGa0yQ7DvWFyXw4VUc [Accessed 27 Sep. 2019].
● Kang, G.J., Ewing-Nelson, S.R., Mackey, L., Schlitt, J.T., Marathe, A., Abbas, K.M. and Swarup, S., 2017. Semantic network analysis of vaccine
sentiment in online social media. Vaccine, 35(29), pp.3621-3638.
● Pinto, J. C. L., & Chahed, T. 2014. Modeling Multi-topic Information Diffusion in Social Networks Using Latent Dirichlet Allocation and Hawkes
Processes. 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems, 339–346
● Romero, D. M., Meeder, B., & Kleinberg, J. 2011. Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags,
and Complex Contagion on Twitter. Proceedings of the 20th International Conference on World Wide Web, 695–704. New York, NY, USA: ACM.
● Yang, J., & Leskovec, J. 2010. Modeling Information Diffusion in Implicit Networks. 2010 IEEE International Conference on Data Mining, 599–608.
● Kafeza, E., Kanavos, A., Makris, C., & Vikatos, P. 2014. Predicting Information Diffusion Patterns in Twitter. Artificial Intelligence Applications and
Innovations, 79–89. Springer Berlin Heidelberg.
● Aral, M. (n.d.). Sundararajan (2009) Aral, S., Muchnik, L., & Sundararajan, A.(2009). Distinguishing influence-based contagion from
homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549.
● Yardi, S., & Boyd, D. (2010). Dynamic Debates: An Analysis of Group Polarization Over Time on Twitter. Bulletin of Science, Technology & Society,
30(5), 316–327.
● Kossinets, G., & Watts, D. J. (2009). Origins of Homophily in an Evolving Social Network. The American Journal of Sociology, 115(2), 405–450.
● Wojcieszak, M. E., & Mutz, D. C. (2009). Online Groups and Political Discourse: Do Online Discussion Spaces Facilitate Exposure to Political
Disagreement? The Journal of Communication, 59(1), 40–56.
● Centola, D., & Macy, M. (2007). Complex Contagions and the Weakness of Long Ties. The American Journal of Sociology, 113(3), 702–734.
● Speriosu, M., Sudan, N., Upadhyay, S., & Baldridge, J. (2011). Twitter Polarity Classification with Label Propagation over Lexical Links and the
Follower Graph. Proceedings of the First Workshop on Unsupervised Learning in NLP, 53–63. Stroudsburg, PA, USA: Association for
Computational Linguistics.
● Sunstein, C. R. (2002). The law of group polarization. The Journal of Political Philosophy.
● Weeks, B.E., Ksiazek, T.B. and Holbert, R.L., 2016. Partisan enclaves or shared media experiences? A network approach to understanding
citizens’ political news environments. Journal of Broadcasting & Electronic Media, 60(2), pp.248-268.
7272

Modeling Causal Reasoning in Complex Networks through NLP: an Introduction

More Related Content

What's hot

Similar to Modeling Causal Reasoning in Complex Networks through NLP: an Introduction

Recently uploaded

Modeling Causal Reasoning in Complex Networks through NLP: an Introduction