Political astroturfing and organised trolling are online malicious behaviours with significant real-world effects. Common approaches examining these phenomena focus on broad campaigns rather than the small groups responsible. To reveal networks of cooperating accounts, we propose a novel temporal window approach that relies on account interactions and metadata alone. It detects groups of accounts engaging in behaviours that, in concert, execute different goal-based strategies, which we describe. Our approach is validated against two relevant datasets with ground truth data. See https://github.com/weberdc/find_hccs for code and data.
Presented at ASONAM'20 (2020 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining).
Co-authored with Frank Neumann (University of Adelaide)
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
Who’s in the Gang? Revealing Coordinating Communities in Social Media
1. OFFICIALASONAM 7-10 Dec 2020
Derek Weber1,2 & Frank Neumann1
Contact: derek.weber@adelaide.edu.au
1 School of Computer Science, University of Adelaide, Australia.
2 Defence Science and Technology Group, Department of Defence, Australia.
WHO’S IN THE GANG?
REVEALING COORDINATING COMMUNITIES IN SOCIALMEDIA
2. OFFICIALASONAM 7-10 Dec 2020 2
https://twitter.com/conspirator0/status/1328479128908132358 17 Nov 2020
3. OFFICIALASONAM 7-10 Dec 2020
Context
• Social media for political communication
• Targeted marketing → (Political) Spam & recruitment
• Anonymity → Trolls
• Automation → Bots, social bots & political bots
3
Targeted marketing + Anonymity + Automation = Interference
5. OFFICIALASONAM 7-10 Dec 2020
Information Campaigns & Coordination Strategies
5
Intent Strategy Planning Execution
Post Repost
time
time
Hostile
Friendly
Good Post Junk Post
time
Channel: e.g., #OurPartyRocks
time
t0 t1
t1 t2
t2 t3
Pollution
Woolley (2016)
Fisher (2018)
Nasim et al. (2018)
Boost
Cao et al. (2015)
Vo et al. (2017)
Graham et al. (2020)
Bully
Hine et al. (2017)
Kumar et al. (2018)
6. OFFICIALASONAM 7-10 Dec 2020
The Challenge
• Discovery
• RQ1 How can highly coordinating communities (HCCs) be found?
• Validation
• RQ2 How do the discovered communities differ?
• RQ3 How consistent is the HCC messaging?
• RQ4 Are the HCCs internally or externally focused?
6
To identify groups of accounts whose behaviour,
though typical in nature, is anomalous in degree.
9. OFFICIALASONAM 7-10 Dec 2020
Extract HCCs
9
Focal Structures Analysis1 – Variant (FSA_V)
https://github.com/weberdc/find_hccs
1 Şen et al., 2016
10. OFFICIALASONAM 7-10 Dec 2020
Extract HCCs
10
Focal Structures Analysis1 – Variant (FSA_V)
https://github.com/weberdc/find_hccs
1 Şen et al., 2016
11. OFFICIALASONAM 7-10 Dec 2020
Evaluation
• Window size, γ = {15, 60, 360, 1440} minutes
• Community extraction:
• FSA_V, θ = 0.3
• K Nearest Neighour (kNN), k = ln(|V|) (cf. Cao et al., 2015)
• Threshold
• Coordination strategy
• Boost (co-retweet)
• Pollute (co-hashtag)
• Bully (co-mention)
11
12. OFFICIALASONAM 7-10 Dec 2020
Data
DS1 – Australian regional election, 2018
• Including ground truth (GT, cf. Keller et al., 2017)
DS2 – Twitter’s election integrity dataset1
• Internet Research Agency, 2016 tweets
12
1 https://about.twitter.com/en us/values/elections-integrity.html
Tweets
(T)
Retweets (RT) Accts
(A)
Days T / A /
Day
RT / A /
Day
DS1 115.9k 64.2k 54.5% 20.6k 18 0.31 0.17
- GT 4.2k 2.5k 59.7% 134 18 1.74 1.04
DS2 1.57m 729.9k 56.6% 1.4k 365 3.12 1.45
Ethics
University of Adelaide
HREC H-2018-045https://github.com/weberdc/find_hccs
13. OFFICIALASONAM 7-10 Dec 2020
Finding HCCs
• Coordination Strategies
• HCCs found in all
• Many components (HCCs), incl. a very large one
• kNN – single HCC with internal structure
13
DS1 DS2
FSA_V kNN Threshold FSA_V kNN Threshold
GT
Networks: Gephi https://gephi.org
15. OFFICIALASONAM 7-10 Dec 2020
Hashtags
15
Retweeting the same tweet
Retweeting the same account
GT DS1
DS2
16. OFFICIALASONAM 7-10 Dec 2020
Consistency
Hypothesis
• Dissemination groups should have
highly similar content
• i.e., Int. similarity ≥ Ext. similarity
Approach
• For each group:
• For each member:
• Combine member tweets into a corpus
• Compare 5-char n-grams of corpus
against all other accounts
• Plot similiarities as a matrix
cf. a heatmap
16
GT DS1
DS2 RANDOM
23. OFFICIALASONAM 7-10 Dec 2020
Literature
Campaign detection
• Content (Lee et al., 2013)
• URL sharing (Cao et al., 2015)
• Temporal signatures (Hine et al., 2017)
• Cross-platform linking (Starbird & Wilson, 2020)
Social bots
• Agenda-oriented automated accounts pretending to be human (Ferrara et al., 2016)
• Hard to identify (Cresci et al., 2017; Nasim et al., 2018; Grimme et al., 2018)
Coordination as “orchestrated activities”
• Focus on detecting strategies (Fisher, 2018; Grimme et al., 2018; Starbird et al., 2019;
Weber, 2019)
• Co-retweet (Weber, 2019; Graham et al., 2020)
• Co-hashtag (Woolley, 2016; Fisher, 2018)
• Co-URL (Cao et al., 2015; Giglietto et al., 2020)
23
24. OFFICIALASONAM 7-10 Dec 2020
• Brooking, E. T., and Singer, P. W. (2016). War Goes Viral: How social media is being weaponized across the world. The Atlantic. Retrieved
from https://www.theatlantic.com/magazine/archive/2016/11/war-goes-viral/501125/
• Cao, C., Caverlee, J., Lee, K., Ge, H. and Chung, J. 2015. Organic or Organized?: Exploring URL Sharing Behavior. CIKM’15, 513-522.
• Cresci, S., Pietro, R. D., Petrocchi, M., Spognardi, A. and Tesconi, M. 2017. The Paradigm-Shift of Social Spambots. WWW’17 (Companion
Volume), 963-972.
• Ferrara, E., Varol, O., Davis, C., Menczer, F. and Flammini, A. 2016. The rise of social bots. Communications of the ACM. 59(7) (Jun. 2016),
96–104. DOI:10.1145/2818717.
• Fisher, A. 2018. Netwar in Cyberia: Decoding the Media Mujahidin. USC Centre on Public Diplomacy, Figueroa Press.
• Giglietto, F., Righetti, N., Rossi, L. and Marino, G. 2020. Coordinated Link Sharing Behavior as a Signal to Surface Sources of Problematic
Information on Facebook. SMSociety, 85-91.
• Grimme, C., Assenmacher, D. and Adam, L. 2018. Changing Perspectives: Is It Sufficient to Detect Social Bots? HCI (13) 2018, 445–461.
• Graham, T., Bruns, A., Zhu, G., and Campbell, R. 2020. Like a virus: The coordinated spread of coronavirus disinformation. Centre for
Responsible Technology, The Australia Institute.
• Hine, G. E., Onaolapo, J., Cristofaro, E. D., Kourtellis, N., Leontiadis, I., Samaras, R., Stringhini, G. and Blackburn, J. 2017. Kek, Cucks, and
God Emperor Trump: A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on the Web. ICWSM’17, 92–101.
• Keller, F.B., Schoch, D., Stier, S. and Yang, J.H. 2017. How to Manipulate Social Media: Analyzing Political Astroturfing Using Ground Truth
Data from South Korea. ICWSM’17, 564–567
• Kumar, S., Hamilton, W.L., Leskovec, J. and Jurafsky, D. 2018. Community Interaction and Conflict on the Web. Proceedings of the 2018
World Wide Web Conference, WWW’18, 933–943 .
24
References (1)
25. OFFICIALASONAM 7-10 Dec 2020
References (2)
• Lee, K., Caverlee, J., Cheng, Z. and Sui, D. Z. 2013. Campaign extraction from social media. ACM Transactions on Intelligent Systems and
Technology. 5(1), 9:1–9:28. DOI:10.1145/2542182.2542191.
• Lim, K. H., Jayasekara, S., Karunasekera, S., Harwood, A., Falzon, L., Dunn, J. and Burgess, G. 2019. RAPID: Real-time Analytics Platform
for Interactive Data Mining. KCML/PKDD (3) 2018. 649–653.
• Nasim, M., Nguyen, A., Lothian, N., Cope, R. and Mitchell, L. 2018. Real-time Detection of Content Polluters in Partially Observable Twitter
Networks. WWW’18 (Companion Volume), 1331-1339.
• Pacheco, D., Hui, P.-M., Torres-Lugo, C., Truong, B. T., Flammini, A. and Menczer, F. 2020-01-16. Uncovering Coordinated Networks on
Social Media. ICWSM’21, to appear.
• Rizoiu, M.-A., Graham, T., Zhang, R., Zhang, Y., Ackland, R. and Xie, L. 2018. #DebateNight: The Role and Influence of Socialbots on
Twitter During the 1st 2016 U.S. Presidential Debate. ICWSM’18, 300–309.
• Saulwick, A., & Trentelman, K. (2014). Towards a formal semantics of social influence. Knowledge-Based Systems, 71, 52–60.
DOI:10.1016/j.knosys.2014.06.022
• Şen, F., Wigand, R., Agarwal, N., Tokdemir, S., and Kasprzyk, R. 2016. Focal structures analysis: identifying influential sets of individuals in
a social network. Social Network Analysis and Mining, 6(1). DOI:10.1007/s13278-016-0319-z
• Starbird, K. and Wilson, T. 2020. Cross-Platform Disinformation Campaigns: Lessons Learned and Next Steps. Harvard Kennedy School
Misinformation Review. (Jan. 2020). DOI:10.37016/mr-2020-002.
• Starbird, K., Arif, A. and Wilson, T. 2019. Disinformation as Collaborative Work:: Surfacing the Participatory Nature of Strategic Information
Operations . Proc. ACM on Human-Computer Interaction. 3 (CSCW), 127:1–127:26. DOI:10.1145/3359229.
• Vo, N., Lee, K., Cao, C., Tran, T. and Choi, H. 2017. Revealing and detecting malicious retweeter groups. ASONAM’17, 363-368.
• Weber, D. 2019. On Coordinated Online Behaviour. Poster presented at ASNAC’19, Adelaide, Australia.
25
Editor's Notes
In mid-November there was a sudden spate of almost identical tweets posted, starting with “My wife just told me that she voted for Joe Biden. As we speak we are getting divorced and I’m leaving for” somewhere, “Papers signed everything is done.. Absolutely disgusted with the 2020 elections, what a disgrace”.
This collection was posted by a data scientist with an interest in propaganda on social media.
As you can see, the language is almost, but not quite, identical, and based on other analyses, these are not bots. An earlier instance of this “copypasta” related to the sale of a football club, and a huge number of accounts (https://twitter.com/conspirator0/status/1299127612804075523) posted pretty much the same message, which, in the scheme of things, is relatively harmless. This example, however, could have effects such as reinforcing the idea of rejecting election results, which hurts democratic systems.
This kind of activity could be regarded as an information campaign, especially if it was seeded or supported by a foreign adversary, so it’s important to be able to identify such campaigns. As these accounts aren’t likely to be bots, bot detection systems won’t be as much help, unless they’re used for retweeting.
What is, however, of particular interest, is finding the groups of accounts who are working together to do this, and to see how they’re coordinating their activities.
Cf. “Swarmcast” p.10 from Fisher, A. 2018. “Netwar in Cyberia: Decoding the Media Mjahidin”, USC Center on Public Diplomacy, Paper 5.
Online marketing used to be just spam, now it’s political advertising
Anonymity is great for giving the disenfranchised a voice, but it also enables trolling, and I suspect Twitter’s latest ‘Fleets’ feature will make this worse.
Automation is great for news aggregators and sports announcement bots, but it allows accounts to post vast amounts of polarising, biased information, mis- and disinformation.
It’s now much easier for nation states to interfere with each other online discussion, particular political discussion.
When I refer to coordination, there’s a spectrum.
Fundamentally, as described by Malone and Crowston in the 90s, it’s the alignment of dependencies between tasks and the resources they use.
At higher levels are Starbird et al’s descriptions of information campaigns being orchestrated (run from on high), cultivated (e.g., infiltrating existing issue-motivated groups), or emergent (like in conspiracy communities).
In between is the space where specific communication actions occur. In our case, we’re interested in social media communications, which have many parallels, so methods to detecting reposting may apply to Twitter retweets, Facebook shares, or Tumblr Reposts.
Intent
Convince population to lose weight
Strategy
Ad blitz + word of mouth + tax incentives
(Planning)
Talk to ad company, design/disseminate flyers, craft legislation
Execution
TV ads, social media ads, flyers in fast food shops, pass laws & enforce
These are some coordination strategies observed in the literature, …
This is not an exhaustive list, of course, and coordination of activity may take many guises, but we’re focusing on co-actions, where a pair of accounts do the same thing to achieve their goal.
To find the highly coordinating communities, or HCCs, we first
[CLICK] Extract the abstracted common interaction behaviours from social media posts of a variety of platforms;
[CLICK] Then create a multi-digraph of interactions between users, hashtags and URLs.
[CLICK] This interaction graph is mined for evidence of coordination based on a search criterion, which is general and could be quite domain-specific. Examples of these include amplify-by-repost which is deliberate dissemination through content sharing, channel pollution through posting to particular hashtags or other communities, and coordinated attacks on a single user or community.
[CLICK] A latent coordination network is constructed from this evidence, being a weighted undirected network of users,
[CLICK] And this is then mined for the most highly coordinating communities.
That covers how to search for HCCs in social media data, but we need to consider its temporal aspect.
[CLICK] (shrink) [CLICK] (next slide)
Given a timeline of social media posts, we segment them into windows of gamma minutes.
We can vary the window depending on the nature of coordination sought.
Lots of variables, so I’ll mostly focus on 15m windows, FSA_V and Boost via co-retweet.
Threshold = retain heaviest normalised edges above 0.1
Clearly different groups are found.
The kNN results consist of a single large component, but I’ve used Gephi’s Force Atlas and then Fruchter Rheingold layouts to identify internal structures, and then applied the Louvain method to identify the clusters by colour. I’ve done this for visualisation purposes, but have not analysed the networks any more deeply yet.
[CLICK] The final HCCs are the ones I discovered in the ground truth – each component consists of accounts from a different political party.
How similar is the membership between the HCCs discovered in different window sizes? Quite a lot of variation.
Different HCCs have clearly different content – looking at their hashtags, we can get a feel for what their interests are.
Internal vs External focus
Internal retweet ratio – how often do they retweet themselves?
Internal mention ratio – how often do they mention themselves?
Remember that the members of HCCs need not be directly connected – all their connections may be inferred.
Looked at Twitter data over the recent 2020 US Democratic and Republican Conventions back in August.
Using co-retweet and a 10 second window, I identified a number of communities, all of which Botometer tells us are highly bot-like, and many of which present themselves as normal people, but with greatly inflated tweeting rates.
Named accounts are organisational or have been deleted.
Co-hashtag with 10 second window to find the HCCs, then added the hashtags they use back in, to see which HCCs are associated.
Structures tells us something about behaviour:
Clusters around a few hashtags says many groups are, in fact, one
Isolated stars tell us the accounts are pushing an agenda (the content of the hashtags) and no one else here is interested
Fans are pushing an agenda but have connected with other communities here via the linking hashtags
This tells me the community extraction method that I used (FSA_V) could do with a bit of tweaking and perhaps communities could be stitched back together.
Future:
Statistical measures
Evolution of HCCs
Simulation of coordination strategies
Campaign detection spawned out of spam detection but has relied on a number of features over the years.
Automation detection is particular important for social bots, accounts that masquerade as real humans, but are, in fact, automated.
It’s so hard to tell bots and humans apart that the real question is more about how they work together to achieve goals – how do they orchestrate their activities? For example, by disseminating content by retweeting the same tweets, or URLs, or using the same hashtags.