SlideShare a Scribd company logo
1 of 12
DETECTING AUTOMATICALLY MANAGED
ACCOUNTS IN ONLINE SOCIAL NETWORKS:
GRAPH EMBEDDING APPROACH
Ilia Karpov (karpovilia@gmail.com)
Ekaterina Glazkova (catherine.glazkova@gmail.com)
Moscow, 2020
BOT ACCOUNT EXAMPLES
Catch Me if You Can
Detecting Automatically Managed Accounts in OSN
DATA COLLECTION
Defining Bot account
Detecting Automatically Managed Accounts in OSN
1.Manual annotation
2.Suspended users lists
3.Honeypots
Existing approaches*
* F. Morstatter et all: A New Approach to Bot Detection: Striking the Balance between Precision and Recall (2016)
A bot is an account created and used to generate profit for the owner by violating the rules of a social network by automatic methods
1.Account exchanges monitoring
2.Suspended users lists
3.Induction based search**
Proposed approach
CLASSIFICATION PROBLEM
Profile features
Detecting Automatically Managed Accounts in OSN
CLASSIFICATION PROBLEM
Profile model
Detecting Automatically Managed Accounts in OSN
• country_id
• personal_people_main
• city_title
• sex
• personal_langs
• counters_gifts
• mobile_phone
• counters_pages
• personal_alcohol
• is_closed
• last_seen_platform
• home_phone
• relation_partner_first_name
• relation
• counters_followers
• domain
• occupation_id
• counters_subscriptions
• personal_smoking
• movies
• occupation_name
• counters_photos
• counters_videos
• city_id
• bdate
• university
• counters_audios
• last_seen_time
• faculty
• counters_user_photos
• counters_groups
• has_photo
Selected static features Selected network features
• friend_id
EMBEDDING GENERATION
Node2Vec
Detecting Automatically Managed Accounts in OSN
* A. Grover: node2vec: Scalable Feature Learning for Networks (2016)
EMBEDDING GENERATION
Attri2Vec
Detecting Automatically Managed Accounts in OSN
* Zhang et al: Attributed network embedding via subspace discovery (2019)
Detecting Automatically Managed Accounts in OSN
p = 0.25 p = 0.5 p = 1 p = 2 p = 4
q = 0.25 0.727 0.823 0.751 0.753 0.793
q = 0.5 0.750 0.795 0.796 0.806 0.754
q = 1 0.771 0.804 0.765 0.788 0.772
q = 2 0.747 0.742 0.808 0.764 0.779
q = 4 0.776 0.724 0.745 0.709 0.793
p = 0.25 p = 0.5 p = 1 p = 2 p = 4
q = 0.25 0.856 0.814 0.804 0.823 0.780
q = 0.5 0.787 0.768 0.813 0.799 0.822
q = 1 0.863 0.812 0.847 0.829 0.808
q = 2 0.821 0.931 0.776 0.793 0.848
CLASSIFICATION PROBLEM
LogReg Classification ROC AUC based on N2V embedding
Sophisticated accounts
Technical accounts
Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM
Classification ROC AUC
Technical accounts Sophisticated accounts
Attri2Vec 0.988 0.684
Node2Vec 0.93 0.87
Static 0.85 0.81
N2V + SF 0.934 0.91
• Support Vector Classifier (SVC)
• Random Forest (RF)
• Logistic Regression (LogReg)
Classifiers evaluation
Model results
Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM
Comparison with existing approaches
Technical accounts Sophisticated accounts
AUC ROC 0.988 0.867
Zegzhda et.al. --- 0.73
Skorniakov et.al. --- 0.820
• Two bot detection datasets with anonymised data *
• More than 80 network embedding trainings with different parameters.
• Classifiers on embeddings obtained with network embedding.
• Classifiers based on static features.
• Classifiers on the concatenation of static features and embeddings.
Contributions
* https://github.com/karpovilia/botdetection
Detecting Automatically Managed Accounts in OSN
FUTURE RESEARCH
• use of text embedding - a significant part of artificial accounts performs the
function of promoting certain goods or disseminating information, which can be
used for classification;
• significant number of accounts hide their friends, but leave open groups that can
be used to model a user as a bipartite graph node;
• network modeling as a temporal network is of interest, taking into account such
characteristics as the joint appearance of accounts on the network
Questions?
Ilia Karpov (karpovilia@gmail.com)
Ekaterina Glazkova (catherine.glazkova@gmail.com)

More Related Content

Similar to Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach

KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
WSO2
 

Similar to Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach (20)

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
3452 - Managing your applications
3452 - Managing your applications3452 - Managing your applications
3452 - Managing your applications
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptx
 
Driving Insights in the Digital Enterprise
Driving Insights in the Digital EnterpriseDriving Insights in the Digital Enterprise
Driving Insights in the Digital Enterprise
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
 
Monitoring modern applications: Introduction to AWS xray
Monitoring modern applications: Introduction to AWS xrayMonitoring modern applications: Introduction to AWS xray
Monitoring modern applications: Introduction to AWS xray
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
PST Labs presentation general
PST Labs presentation generalPST Labs presentation general
PST Labs presentation general
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
From measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 PlatformFrom measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 Platform
 
Metrics that every startup should know
Metrics that every startup should knowMetrics that every startup should know
Metrics that every startup should know
 
Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Whose Stack Is It Anyway?
Whose Stack Is It Anyway?
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
 
KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
 
(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby
 

Recently uploaded

obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 

Recently uploaded (20)

MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 

Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach

  • 1. DETECTING AUTOMATICALLY MANAGED ACCOUNTS IN ONLINE SOCIAL NETWORKS: GRAPH EMBEDDING APPROACH Ilia Karpov (karpovilia@gmail.com) Ekaterina Glazkova (catherine.glazkova@gmail.com) Moscow, 2020
  • 2. BOT ACCOUNT EXAMPLES Catch Me if You Can Detecting Automatically Managed Accounts in OSN
  • 3. DATA COLLECTION Defining Bot account Detecting Automatically Managed Accounts in OSN 1.Manual annotation 2.Suspended users lists 3.Honeypots Existing approaches* * F. Morstatter et all: A New Approach to Bot Detection: Striking the Balance between Precision and Recall (2016) A bot is an account created and used to generate profit for the owner by violating the rules of a social network by automatic methods 1.Account exchanges monitoring 2.Suspended users lists 3.Induction based search** Proposed approach
  • 4. CLASSIFICATION PROBLEM Profile features Detecting Automatically Managed Accounts in OSN
  • 5. CLASSIFICATION PROBLEM Profile model Detecting Automatically Managed Accounts in OSN • country_id • personal_people_main • city_title • sex • personal_langs • counters_gifts • mobile_phone • counters_pages • personal_alcohol • is_closed • last_seen_platform • home_phone • relation_partner_first_name • relation • counters_followers • domain • occupation_id • counters_subscriptions • personal_smoking • movies • occupation_name • counters_photos • counters_videos • city_id • bdate • university • counters_audios • last_seen_time • faculty • counters_user_photos • counters_groups • has_photo Selected static features Selected network features • friend_id
  • 6. EMBEDDING GENERATION Node2Vec Detecting Automatically Managed Accounts in OSN * A. Grover: node2vec: Scalable Feature Learning for Networks (2016)
  • 7. EMBEDDING GENERATION Attri2Vec Detecting Automatically Managed Accounts in OSN * Zhang et al: Attributed network embedding via subspace discovery (2019)
  • 8. Detecting Automatically Managed Accounts in OSN p = 0.25 p = 0.5 p = 1 p = 2 p = 4 q = 0.25 0.727 0.823 0.751 0.753 0.793 q = 0.5 0.750 0.795 0.796 0.806 0.754 q = 1 0.771 0.804 0.765 0.788 0.772 q = 2 0.747 0.742 0.808 0.764 0.779 q = 4 0.776 0.724 0.745 0.709 0.793 p = 0.25 p = 0.5 p = 1 p = 2 p = 4 q = 0.25 0.856 0.814 0.804 0.823 0.780 q = 0.5 0.787 0.768 0.813 0.799 0.822 q = 1 0.863 0.812 0.847 0.829 0.808 q = 2 0.821 0.931 0.776 0.793 0.848 CLASSIFICATION PROBLEM LogReg Classification ROC AUC based on N2V embedding Sophisticated accounts Technical accounts
  • 9. Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM Classification ROC AUC Technical accounts Sophisticated accounts Attri2Vec 0.988 0.684 Node2Vec 0.93 0.87 Static 0.85 0.81 N2V + SF 0.934 0.91 • Support Vector Classifier (SVC) • Random Forest (RF) • Logistic Regression (LogReg) Classifiers evaluation Model results
  • 10. Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM Comparison with existing approaches Technical accounts Sophisticated accounts AUC ROC 0.988 0.867 Zegzhda et.al. --- 0.73 Skorniakov et.al. --- 0.820 • Two bot detection datasets with anonymised data * • More than 80 network embedding trainings with different parameters. • Classifiers on embeddings obtained with network embedding. • Classifiers based on static features. • Classifiers on the concatenation of static features and embeddings. Contributions * https://github.com/karpovilia/botdetection
  • 11. Detecting Automatically Managed Accounts in OSN FUTURE RESEARCH • use of text embedding - a significant part of artificial accounts performs the function of promoting certain goods or disseminating information, which can be used for classification; • significant number of accounts hide their friends, but leave open groups that can be used to model a user as a bipartite graph node; • network modeling as a temporal network is of interest, taking into account such characteristics as the joint appearance of accounts on the network
  • 12. Questions? Ilia Karpov (karpovilia@gmail.com) Ekaterina Glazkova (catherine.glazkova@gmail.com)

Editor's Notes

  1. artificial accounts distort the popularity of groups, spreadfake news, are used for fraud activities
  2. We are going to analyse both network and static features