Network Analysis of the 2016
U.S.A. Presidential Campaign
Tweets
Dmitry Zinoviev
Math and Computer Science Department
Suffolk University, Boston, MA
July 13, 2018 IC2S2 Evanston, IL 2
Project Goals
 Apply complex network analysis (CNA) methods to
the complete corpora of tweets posted by the major
candidates in the 2016 USA presidential election
 Identify common topics of discourse
 Identify followers and leaders
July 13, 2018 IC2S2 Evanston, IL 3
Dataset Summary
Candidate # of Tweets Words per
Tweet
Campaign
Start
Campaign
Length
(months)
Campaign
End
Trump** 9,984 22.4 Mar 2015* 22 Dec 2016*
Stein 6,215 26.6 Jul 2015 12 Dec 2016*
Clinton** 6,082 21.6 Apr 2015 20 Nov 2016
Cruz 4,501 17.5 Mar 2015* 21 Nov 2016
Sanders 4,385 25.4 Mar 2015* 22 Dec 2016*
Kasich 3,687 20.8 Mar 2015* 21 Dec 2016*
Rubio 2,933 19.0 Apr 2015 15 Aug 2016
Total: 37,733
*The limit of the observation period
**More data used in the pilot study
July 13, 2018 IC2S2 Evanston, IL 4
Pilot Study: Trump/Clinton Tweets
 Check if CNA is applicable to tweet corpora
 Subdivide the corpus into 48/20 monthly sub-corpora
 Identify the vocabulary (the most frequently used
lemmas, less the stop words)
 Build a correlation network of sub-corpora:
● Two sub-corpora are “similar” if the frequencies of the
vocabularies are at least 60% correlated
 Apply a community detection algorithm
July 13, 2018 IC2S2 Evanston, IL 5
Correlation Network of Tweets
HC joins the campaign
DT's intention to run
election
HC's intention to run
DT de facto wins primaries
July 13, 2018 IC2S2 Evanston, IL 6
Observations
 The tweet sub-corpora of both candidates form a
quasi-linear longitudinal network
 The detected network communities match the major
campaign time steps
 However, the candidates' networks are disjoint
because:
● Either the candidates' agenda are indeed disconnected or
● The used CNA tool is too low-res
!!!
July 13, 2018 IC2S2 Evanston, IL 7
All-Candidate Study
 More candidates
● Added Stein, Rubio, Cruz, Sanders, and Kasich
 Biweekly sub-corpora
 Different vocabulary construction procedure
 Semantic networks vs correlation networks
July 13, 2018 IC2S2 Evanston, IL 8
Overall Bi-weekly Twitter Activity
July 13, 2018 IC2S2 Evanston, IL 9
Vocabulary Construction
 Automated, based on the Corpus of Contemporary
American English (COCA)
●
Not in COCA or less frequent than 25  mean(COCA)  rare
● Happen at least 10 times in the tweets and more frequent
than 25  mean(tweets)  significant
 1,416 terms (include hashtags):
● '-john', '2-party', '2016', '47246', 'ability', 'abolish',
'abolishtheirs', 'abortion', 'absolutely', 'absurd', …,
'worried', 'worse', 'worst', 'worth', 'written', "y'all",
'yesterday', 'youth', 'zero', '—hillary'
July 13, 2018 IC2S2 Evanston, IL 10
Semantic Network Construction
 For each candidate
● For each biweekly sub-corpus:
● Create a network of terms by treating the terms as
nodes and their co-occurrence in the same tweet as edge
weight (287 networks)
● Apply a community detection algorithm [Blondel] to
identify local topics addressed by this candidate over
this two-week span (2,518 communities)
July 13, 2018 IC2S2 Evanston, IL 11
Sample
Network
Trump, 2nd
half of
March 2016
July 13, 2018 IC2S2 Evanston, IL 12
Recurring Topic Extraction
 Construct a network of communities (local topics):
● All communities as nodes
● Jaccard index as edge weight
 Apply a community detection algorithm to identify
recurring topics (later referred to as “topics”) over all
candidates and time spans
●
A term may belong to more than one topic treat a topic
as a fuzzy set with a membership function
 17 topics
July 13, 2018 IC2S2 Evanston, IL 13
Major Recurring Topics
1. hillary (112); campaign (105); trump (101); clinton (91); vote (82)...
2. america (149); great (100); make america (57); america great (54); make
america great (51)...
3. obama (52); iran (30); immigration (28); president obama (26); terrorism
(20)...
4. tonight (62); tune (43); sander (42); town (39); town hall (38)...
5. wage (46); minimum (31); minimum wage (31); worker (30); living (30)...
6. forward (42); looking (39); looking forward (35); tomorrow (24); ticket (8)...
7. tax (51); policy (37); foreign (20); foreign policy (18); military (18)...
8. health (60); health care (51); obamacare (29); republican (27); drug (23)...
9. student (35); debt (31); student debt (23); economy (18); wall street (18)...
10. climate (57); climate change (56); fossil fuel (38); fuel (38); energy (36)...
“112 out of 2,518 communities mention term 'hillary'”
July 13, 2018 IC2S2 Evanston, IL 14
Major Recurring Topics (cont.)
11. failing (14); prayer (12); nytimes (11); thought (7); dishonest (5)...
12. justice (13); criminal (11); police (11); equality (8); officer (7)...
13. united (64); united states (57); citizen (12); america (5)
14. happy (13); birthday (8)
15. conservative (20); courageous conservative (11); courageous (11)
16. lesser evil (10); evil (10); lesser (10); greater (7)
17. god (8); bless (7)
July 13, 2018 IC2S2 Evanston, IL 15
Tracking Topic Popularity
 For each recurring topic
● For each candidate
● For each time span:
– Calculate the Pearson correlation ρ between the frequencies of
the terms in the topic and in the sub-corpus
– ρ~1  the candidate consistently used the topic vocabulary
over the time span
– ρ~0  the candidate disregarded the topic vocabulary over the
time span
July 13, 2018 IC2S2 Evanston, IL 16
Example: Donald Trump
The numbers
in the legend
refer to the
topic IDs on
the list of
topics.
July 13, 2018 IC2S2 Evanston, IL 17
Example: Hilary Clinton
July 13, 2018 IC2S2 Evanston, IL 18
Example: Bernie Sanders
July 13, 2018 IC2S2 Evanston, IL 19
Tracking Followers
 For each recurring topic
● For each candidate
● For each time span:
– Does any local topic of the candidate contribute to the
recurring topic during this time span, but not in the previous
time span (two weeks ago)?
– If so, then this candidate follows the candidates whose local
topics contributed to the same recurring topic two weeks ago
– The said candidates are the leaders
 Construct a graph of followers and leaders
July 13, 2018 IC2S2 Evanston, IL 20
Example: Topic #10 “Climate
change”
Timespan (Participant/Leader)(s) (Newcomer/Follower)(s)
2015-08-02 Sanders, Trump Clinton
2015-09-13 Sanders Rubio
2015-09-27 Sanders, Rubio Clinton
2015-10-11 Clinton Sanders
... ... ...
2016-07-03 Sanders, Stein Clinton
2016-09-11 Sanders, Stein Rubio
2016-10-23 Stein Sanders, Clinton
2017-01-01 Sanders Stein
July 13, 2018 IC2S2 Evanston, IL 21
Engagement and Leadership
Engagement = #leads + #follows
Leadership = #leads - #follows
July 13, 2018 IC2S2 Evanston, IL 22
The Leaders and the Followers
July 13, 2018 IC2S2 Evanston, IL 23
Conclusion
 Complex networks analysis (CNA) can be used to
reveal recurrent topics of the candidates' tweets and
the leader/follower relationships
 There are <20 major recurring topics owned or
shared by the candidates over the campaign timespan
 Trump and Sanders were the most followed
candidates
 Kasich and Rubio were the least followed candidates
July 13, 2018 IC2S2 Evanston, IL 24
Acknowledgement
The author is grateful to Prof. Elena Llaudet
(Department of Government, Suffolk University) for her
absolutely indispensable suggestions.

Network analysis of the 2016 USA presidential campaign tweets

  • 1.
    Network Analysis ofthe 2016 U.S.A. Presidential Campaign Tweets Dmitry Zinoviev Math and Computer Science Department Suffolk University, Boston, MA
  • 2.
    July 13, 2018IC2S2 Evanston, IL 2 Project Goals  Apply complex network analysis (CNA) methods to the complete corpora of tweets posted by the major candidates in the 2016 USA presidential election  Identify common topics of discourse  Identify followers and leaders
  • 3.
    July 13, 2018IC2S2 Evanston, IL 3 Dataset Summary Candidate # of Tweets Words per Tweet Campaign Start Campaign Length (months) Campaign End Trump** 9,984 22.4 Mar 2015* 22 Dec 2016* Stein 6,215 26.6 Jul 2015 12 Dec 2016* Clinton** 6,082 21.6 Apr 2015 20 Nov 2016 Cruz 4,501 17.5 Mar 2015* 21 Nov 2016 Sanders 4,385 25.4 Mar 2015* 22 Dec 2016* Kasich 3,687 20.8 Mar 2015* 21 Dec 2016* Rubio 2,933 19.0 Apr 2015 15 Aug 2016 Total: 37,733 *The limit of the observation period **More data used in the pilot study
  • 4.
    July 13, 2018IC2S2 Evanston, IL 4 Pilot Study: Trump/Clinton Tweets  Check if CNA is applicable to tweet corpora  Subdivide the corpus into 48/20 monthly sub-corpora  Identify the vocabulary (the most frequently used lemmas, less the stop words)  Build a correlation network of sub-corpora: ● Two sub-corpora are “similar” if the frequencies of the vocabularies are at least 60% correlated  Apply a community detection algorithm
  • 5.
    July 13, 2018IC2S2 Evanston, IL 5 Correlation Network of Tweets HC joins the campaign DT's intention to run election HC's intention to run DT de facto wins primaries
  • 6.
    July 13, 2018IC2S2 Evanston, IL 6 Observations  The tweet sub-corpora of both candidates form a quasi-linear longitudinal network  The detected network communities match the major campaign time steps  However, the candidates' networks are disjoint because: ● Either the candidates' agenda are indeed disconnected or ● The used CNA tool is too low-res !!!
  • 7.
    July 13, 2018IC2S2 Evanston, IL 7 All-Candidate Study  More candidates ● Added Stein, Rubio, Cruz, Sanders, and Kasich  Biweekly sub-corpora  Different vocabulary construction procedure  Semantic networks vs correlation networks
  • 8.
    July 13, 2018IC2S2 Evanston, IL 8 Overall Bi-weekly Twitter Activity
  • 9.
    July 13, 2018IC2S2 Evanston, IL 9 Vocabulary Construction  Automated, based on the Corpus of Contemporary American English (COCA) ● Not in COCA or less frequent than 25  mean(COCA)  rare ● Happen at least 10 times in the tweets and more frequent than 25  mean(tweets)  significant  1,416 terms (include hashtags): ● '-john', '2-party', '2016', '47246', 'ability', 'abolish', 'abolishtheirs', 'abortion', 'absolutely', 'absurd', …, 'worried', 'worse', 'worst', 'worth', 'written', "y'all", 'yesterday', 'youth', 'zero', '—hillary'
  • 10.
    July 13, 2018IC2S2 Evanston, IL 10 Semantic Network Construction  For each candidate ● For each biweekly sub-corpus: ● Create a network of terms by treating the terms as nodes and their co-occurrence in the same tweet as edge weight (287 networks) ● Apply a community detection algorithm [Blondel] to identify local topics addressed by this candidate over this two-week span (2,518 communities)
  • 11.
    July 13, 2018IC2S2 Evanston, IL 11 Sample Network Trump, 2nd half of March 2016
  • 12.
    July 13, 2018IC2S2 Evanston, IL 12 Recurring Topic Extraction  Construct a network of communities (local topics): ● All communities as nodes ● Jaccard index as edge weight  Apply a community detection algorithm to identify recurring topics (later referred to as “topics”) over all candidates and time spans ● A term may belong to more than one topic treat a topic as a fuzzy set with a membership function  17 topics
  • 13.
    July 13, 2018IC2S2 Evanston, IL 13 Major Recurring Topics 1. hillary (112); campaign (105); trump (101); clinton (91); vote (82)... 2. america (149); great (100); make america (57); america great (54); make america great (51)... 3. obama (52); iran (30); immigration (28); president obama (26); terrorism (20)... 4. tonight (62); tune (43); sander (42); town (39); town hall (38)... 5. wage (46); minimum (31); minimum wage (31); worker (30); living (30)... 6. forward (42); looking (39); looking forward (35); tomorrow (24); ticket (8)... 7. tax (51); policy (37); foreign (20); foreign policy (18); military (18)... 8. health (60); health care (51); obamacare (29); republican (27); drug (23)... 9. student (35); debt (31); student debt (23); economy (18); wall street (18)... 10. climate (57); climate change (56); fossil fuel (38); fuel (38); energy (36)... “112 out of 2,518 communities mention term 'hillary'”
  • 14.
    July 13, 2018IC2S2 Evanston, IL 14 Major Recurring Topics (cont.) 11. failing (14); prayer (12); nytimes (11); thought (7); dishonest (5)... 12. justice (13); criminal (11); police (11); equality (8); officer (7)... 13. united (64); united states (57); citizen (12); america (5) 14. happy (13); birthday (8) 15. conservative (20); courageous conservative (11); courageous (11) 16. lesser evil (10); evil (10); lesser (10); greater (7) 17. god (8); bless (7)
  • 15.
    July 13, 2018IC2S2 Evanston, IL 15 Tracking Topic Popularity  For each recurring topic ● For each candidate ● For each time span: – Calculate the Pearson correlation ρ between the frequencies of the terms in the topic and in the sub-corpus – ρ~1  the candidate consistently used the topic vocabulary over the time span – ρ~0  the candidate disregarded the topic vocabulary over the time span
  • 16.
    July 13, 2018IC2S2 Evanston, IL 16 Example: Donald Trump The numbers in the legend refer to the topic IDs on the list of topics.
  • 17.
    July 13, 2018IC2S2 Evanston, IL 17 Example: Hilary Clinton
  • 18.
    July 13, 2018IC2S2 Evanston, IL 18 Example: Bernie Sanders
  • 19.
    July 13, 2018IC2S2 Evanston, IL 19 Tracking Followers  For each recurring topic ● For each candidate ● For each time span: – Does any local topic of the candidate contribute to the recurring topic during this time span, but not in the previous time span (two weeks ago)? – If so, then this candidate follows the candidates whose local topics contributed to the same recurring topic two weeks ago – The said candidates are the leaders  Construct a graph of followers and leaders
  • 20.
    July 13, 2018IC2S2 Evanston, IL 20 Example: Topic #10 “Climate change” Timespan (Participant/Leader)(s) (Newcomer/Follower)(s) 2015-08-02 Sanders, Trump Clinton 2015-09-13 Sanders Rubio 2015-09-27 Sanders, Rubio Clinton 2015-10-11 Clinton Sanders ... ... ... 2016-07-03 Sanders, Stein Clinton 2016-09-11 Sanders, Stein Rubio 2016-10-23 Stein Sanders, Clinton 2017-01-01 Sanders Stein
  • 21.
    July 13, 2018IC2S2 Evanston, IL 21 Engagement and Leadership Engagement = #leads + #follows Leadership = #leads - #follows
  • 22.
    July 13, 2018IC2S2 Evanston, IL 22 The Leaders and the Followers
  • 23.
    July 13, 2018IC2S2 Evanston, IL 23 Conclusion  Complex networks analysis (CNA) can be used to reveal recurrent topics of the candidates' tweets and the leader/follower relationships  There are <20 major recurring topics owned or shared by the candidates over the campaign timespan  Trump and Sanders were the most followed candidates  Kasich and Rubio were the least followed candidates
  • 24.
    July 13, 2018IC2S2 Evanston, IL 24 Acknowledgement The author is grateful to Prof. Elena Llaudet (Department of Government, Suffolk University) for her absolutely indispensable suggestions.