Matthew_Davis_Slides.pptx

Arizona State University
Types of Bots
Types of Bots: Categorizing Accounts
Using Unsupervised Machine Learning
Matthew Davis

Types of Bots 2
Agenda
Motivation
Methods and Data
Experiments
Conclusion

Types of Bots 3
Traditional Bot Detection Problem
Machine Learning
Algorithm
Twitter Account
Bot
Human
Features
𝑥1
𝑥2
⋮
𝑥𝑑

Types of Bots
1. Improving existing bot detection models
• Bots have become more complex over time
• Bot activities change over time and domain
2. Monitoring bots over time
• Individual bots may get banned over time
• New perspective for monitoring
3. Narrowing the scope of the bot detection problem
• Not necessary to remove all bots, only certain malicious types
• Helps both researchers and social media companies
4
Motivations

Types of Bots 5
Related Study: Example Typology of Bots
Imitation
of
human
behavior
Benign Neutral Malicious
Intent
• Chat bots • Humoristic bots
• Astroturfing bots
• Social botnets in politics
• Infiltration bots
• Influence bots
• Sybils
• Doppelgänger bots
• News bots
• Recruitment bots
• Dissemination bots
• Earthquake Warning bots
• Editing bots
• Nonsense bots
• Spam bots
• Botnet command bots
• Pay bots
Stefan Stieglitz, Florian Brachten, Björn Ross, and Anna-Katharina Jung. “Do Social Bots Dream of Electric Sheep? A
Categorization of Social Media Bot Accounts”. In: CoRR (2017), pp. 1–11.
.
Medium
to
High
Low
to
None

Types of Bots
• Obvious to human users, post many tweets to
make content visible
Simple bots
• Interact with human users by exploiting
retweets, hashtags, and mentions
Sophisticated
bots
• Increase the popularity of and lend credibility
to some other users on the network
Fake
Followers
• Use coordination to interact with each other
and/or human users on the network
Botnets
6
Related Study: Yang Typology of Bots
Kai-Cheng Yang, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. “Arming the
public with artificial intelligence to counter social bots”. In: Human Behavior and Emerging Technologies (2019), e115.
.

Types of Bots
“Is it possible to automate the
process of grouping bots by
their respective types?”
7
Research Question

Types of Bots 8
Agenda
Motivation
Methods and Data
Experiments
Conclusion

Types of Bots 9
Methodology
Set of bots
Clustering
Algorithms
K-Means
GMM
Ward Hierarchical
Unsupervised Machine Learning Types
# of Clusters
Selection

Types of Bots 10
Features of Bot Types
Content-Based
Features
LDA Topic
Probabilities
Retweets
User mentions
Hashtags
URLs
Sophisticated
bots
Simple bots
Domain
adherence

Types of Bots 11
Bot Datasets
Property Caverlee 2011 Morstatter 2016 Cresci 2017*
Tweets 2,353,473 322,475 3,798,254
Retweets 63,202 96,796 92,055
Accounts 20,601 2,029 9,114
Active 14,321 1,952 5,813
Suspended 4,716 57 2,906
Deleted 1,564 20 395
Labeling Approach Honeypot Honeypot Manual Annotation
* The Cresci dataset is broken down further into datasets labeled: Traditional Spambots #1, Social
Spambots #1, Social Spambots #2, Social Spambots #3, Fake Followers

Types of Bots 12
Agenda
Motivation
Methods and Data
Experiments
Conclusion

Types of Bots 13
Visualizing the Data
On first look, the data is somewhat separable.

Types of Bots
0
2000
4000
6000
8000
10000
12000
14000
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
Cluster Decompositions for k-Means (with k = 7)
Caverlee Morstatter Fake Followers Social Spambots 1 Social Spambots 2 Social Spambots 3 Traditional Spambots 1
14
Examining the Clusters
The Social Spambots datasets are separable, will be discarded.

Types of Bots 15
Aggregated Dataset
Data is now not trivially separable.

Types of Bots 16
Choosing the Optimal Number of Clusters
Silhouette Plots for K-Means
It’s unclear which number of clusters is optimal.

Types of Bots 17
Choosing the Optimal Number of Clusters
Caliński-Harabasz Index
Caliński-Harabasz Index = Variance Ratio Criterion =
𝑇𝑜𝑡𝑎𝑙 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑤𝑖𝑡ℎ𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠

Types of Bots 18
K-Means Results for Optimal Number of Clusters
Colors are not significant here

Types of Bots 19
Determining Topic (Domain) Similarity
Word Cloud for K-Means Clusters with k = 4
All clusters share topics, so the bots within must participate in a similar domain.

Types of Bots
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Average
%
of
Users’
Total
Tweets
Retweet Mention URL Hashtag
20
Comparison of Cluster Features
Fake Followers
Simple bots Sophisticated bots
Simple bots
Feature Overview for K-Means Clusters with k = 4

Types of Bots 21
Manual Analysis of Clusters
Simple bots 1 Fake Followers Simple bots 2 Sophisticated bots

Types of Bots 22
Current State of Bots in Clusters
70%
50%
81% 81%
24%
44%
14% 10%
5% 6% 6% 9%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Active Suspended Deleted
Cluster Active Suspended Deleted Total
1 8,818 3,076 685 12,579
2 4,101 3,570 479 8,150
3 1,212 207 84 1,503
4 3,706 471 414 4,591
Fake Followers
Simple bots

Types of Bots 23
Cluster Decompositions
Cluster Caverlee Morstatter Fake Followers Traditional Spambots #1 Total
1 11,644 229 141 565 12,579
2 4,862 446 2,734 108 8,150
3 1,077 146 47 233 1,503
4 3,017 1,201 280 93 4,591
0
2000
4000
6000
8000
10000
12000
14000
Caverlee Morstatter Fake Followers Traditional Spambots #1
Fake Followers
Simple bots

Types of Bots 24
Temporal Analysis of Bot Clusters
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
2007 2008 2009 2010
Number
of
Tweets
Month
Caverlee - Tweets by Month
0
2000
4000
6000
8000
10000
12000
14000
16000
Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2008 2009 2010
Number
of
Tweets
Month
Traditional Spambots #1 - Tweets by Month
Cluster 1 (Simiple) Cluster 2 - (Fake Followers) Cluster 3 - (Simple) Cluster 4 - (Sophisticated)

Types of Bots
0
50000
100000
150000
200000
250000
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
2011 2012 2013 2014 2015
Number
of
Tweets
Month
Morstatter - Tweets by Month
0
5000
10000
15000
20000
Oct
Nov
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
2008 2009 2010 2011 2012 2013
Number
of
Tweets
Month
Fake Followers - Tweets by Month
25
Temporal Analysis of Bot Clusters
0
50000
100000
150000
200000
250000
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
Apr
Ma
y
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Ma
r
2011 2012 2013 2014 2015
Number
of
Tweets
Month
Morstatter - Tweets by Month

Types of Bots 26
Agenda
Motivation
Methods and Data
Experiments
Conclusion

Types of Bots
Typology
• Use the typology presented by Yang et al. with 4 types of bots:
simple bots, fake followers, sophisticated bots, and botnets
Dataset
• Combined Caverlee, Morstatter, and Cresci datasets to form new dataset
• Checked data for bias and verified data fell into a single domain
Types of Bots
• Used K-Means, GMM, and Ward Hierarchical Clustering to group bots
• Assigned labels to clusters based on previous bot typology
• Evaluated cluster results using datasets as ground-truth and manual annotation
27
Conclusion

Types of Bots
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. “The
Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race”. In: Proceedings of
the 26th International Conference on World Wide Web Companion. (2017), pp. 963–972.
Kyumin Lee, Brian David Eoff, and James Caverlee. “Seven Months with the Devils: A Long-Term Study of
Content Polluters on Twitter.” In: Proceedings of the 5th International Conference on Web and Social
Media (ICWSM). AAAI. The AAAI Press, 2011, pp. 185–192.
Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M Carley, and Huan Liu. “A New Approach to Bot
Detection: Striking the Balance between Precision and Recall”. In: ASONAM. IEEE. 2016, pp. 533–540.
Stefan Stieglitz, Florian Brachten, Björn Ross, and Anna-Katharina Jung. “Do Social Bots Dream of
Electric Sheep? A Categorization of Social Media Bot Accounts”. In: CoRR (2017), pp. 1–11.
Kai-Cheng Yang, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer.
“Arming the public with artificial intelligence to counter social bots”. In: Human Behavior and Emerging
Technologies (2019), e115.
28
References

Types of Bots
Tahora Hossein Nazer, Matthew Davis, Mansooreh Karami, Leman Akoglu, David
Koelle, and Huan Liu. “Bot Detection: Will Focusing on Recall Cause Overall
Performance Deterioration.” In: International Conference on Social Computing,
Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and
Simulation. Springer. 2019, pp. 39–49.
29
Publications

Types of Bots
• Data Mining and Machine Learning Lab
• Research Grant from ONR: N000141812108
30
Acknowledgements
Dr. Huan Liu
Dr. Guoliang Xue
Dr. Fred Morstatter
University of Southern
California

Types of Bots 31
Contributions
•Summarized existing research on types of bots
•Created an aggregated dataset and performed
experiments to ensure that the dataset did not
contain any bias or multiple domains
•Experimentally demonstrated unsupervised
machine learning can separate bots into types
1
2
3

Matthew_Davis_Slides.pptx

Recommended

Recommended

More Related Content

Similar to Matthew_Davis_Slides.pptx

Similar to Matthew_Davis_Slides.pptx (20)

Recently uploaded

Recently uploaded (20)

Matthew_Davis_Slides.pptx

Editor's Notes