SlideShare a Scribd company logo
1 of 7
Download to read offline
BAHRDAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POST GRADUATE STUDIES
FACULTY OF COMPUTING
Department of Information Technology
Gradute Seminar
Title: Twitter Fake Account Detection
Submitted by: Fentanesh Bezie BDU1300731
fantiebez@gmail.com
Submited to: Mr.Arham D(Ass. Prof)
MAY 11, 2021
BAHIR DAR, ETHIOPIA
i
Contents
1. Introduction.........................................................................................................................................1
2. Objective..............................................................................................................................................1
3. Methodology........................................................................................................................................2
3.1 Tools and Techniques .................................................................................................................2
3.1.1 Naïve Bayes algorithm........................................................................................................2
3.1.2 Entropy minimization discretization (EMD) technique ..................................................2
3.2 Dataset preparation ....................................................................................................................2
4. Critiques...............................................................................................................................................3
4.1 Strong side ...................................................................................................................................3
4.2 Weakness .....................................................................................................................................3
References.....................................................................................................................................................4
ii
Abstract
Millions of people use social networking sites like Twitter and Facebook, and their interactions
with these sites have influenced their lives. This prevalence in social networking has resulted in a
number of issues, including the likelihood of false information being exposed to users through fake
accounts, resulting in the spread of malicious material. This situation has the potential to cause
significant harm to society in the real world. The researcher investigated and presented a tool for
detecting fake Twitter accounts. They analyzed the results of the Nave Bayes algorithm after
processing their dataset using a supervised discretization technique known as Entropy
Minimization Discretization (EMD) on numerical features.
Keywords- machine learning; social media; Twitter; spam detection; fake detection
1
1. Introduction
For the past two decades, the social networking movement has exploded. Various forms of social
networking have spawned a slew of online activities that have piqued the attention of a vast number
of users. On the other hand, they are suffering from an increase in the number of fake accounts
generated.
The term "fake accounts" refers to accounts that do not belong to real people. Fake accounts can
spread false information, deceptive web ratings, and spam. Users with fake accounts engage in
prohibited behavior and break Twitter's laws. They may be automated account interactions or
attempts to deceive or confuse people, such as posting harmful links, creating multiple accounts,
posting frequently to the same subject or duplicate posts, posting links with unrelated tweets, and
abusing the reply and mention features, among other things.
Real accounts are those that follow Twitter's rules. Tweets may be sent as e-mail attachments or
as SMS text messages. Twitter allows users to send and receive 140-character messages directly
from their smartphones through a variety of Web-based services. Twitter disseminates knowledge
to a vast number of real-time users.
Millions of people use social networking sites like Twitter and Facebook, and their interactions
with these sites have influenced their lives. This prevalence in social networking has resulted in a
number of issues, including the likelihood of false information being exposed to users through fake
accounts, resulting in the spread of malicious material. This situation can result to a huge damage
in the real world to the society.
Spammers are a major issue on social media since they can use their identities for a variety of
purposes. Spreading rumors is one of these goals, which can have a significant impact on a specific
company or even the whole community. The researcher detects false profile accounts from the
Twitter online social network to prevent the dissemination of fake news, advertisements, and fake
followers, based on the importance of social media's impact on society.
2. Objective
The aim of this study is to identify fake Twitter profile accounts in order to prevent the
dissemination of false information, advertising, and followers.
2
3. Methodology
3.1 Tools and Techniques
The researcher uses Naïve Bayes algorithm and Entropy Minimization Discretization (EMD)
techniques
3.1.1 Naïve Bayes algorithm
It's used in supervised learning exercises, and it's quick and simple to grasp. It outperforms
numerical variables when it comes to multi-class estimation and categorical input variables. Since
Nave Bayes classifiers have a higher success rate in text classification, they are commonly used in
spam filtering and sentiment analysis.
The predictive attributes, in particular, are believed to be conditionally independent. Let C be a
random variable that represents an instance's class, and X be a vector of random variables that
represents the attribute values. Let c stand for a specific class name, and x for a specific attribute
value.
3.1.2 Entropy minimization discretization (EMD) technique
It is a supervised discretization technique. It evaluates different candidate cut points which are the
midpoints of each pair in a sorted data. To evaluate the cut points, the data is divided into intervals
and the class information entropy is calculated. The point with the minimum entropy among all
candidates is selected. This process is done recursively always selecting the best cut point. A
minimum description length (MDL) is applied to decide when to stop. They used this technique
for their experiments because of its success. They are given a set of instances S, a feature A, and
a partition boundary T, the class information entropy of the partition induced by T, E (A, T, S) is
given by the Equation (2)
(2)
3.2 Dataset preparation
To make the experiments the researcher has created their own dataset by using Twitter API. Twitter
allows to interact with its data such as tweets and several attributes about tweets using Twitter
API. By means of a server-side scripting language requests can be made to Twitter API and results
are in JSON format that can be read easily. There are four main objects in Twitter API. These are:
3
Tweets, Users, Entities and Places. Each of these objects have many attributes. They have selected
16 attributes for their Naive Bayes learning algorithm features.
The researcher prepared dataset for their experiments and their data are collected manually by
three individuals and the intersection of them, that means the common decisions, are selected and
put in the dataset. Class decisions are made by examining username, background image, profile
image, follower and friends count, description of the account, number of tweets, and content of
the tweets. Totally, there are 501 fake and 499 real account data is collected. Evaluation metrics
are Accuracy, F-Measure, and confusion matrix.
First Experiment Applying the Naïve Bayes learning Algorithm on the Dataset Using All
Attributes without Discretization, as a result of the first experiment, 861 of the 1000 instances
are classified correctly with the 86.1% accuracy, 112 of 501 fake accounts are classified as real
and 27 of 499 real accounts are classified as fake, Weighted average of the F-measure is 0,860.
Second Experiment Applying the Naïve Bayes learning Algorithm on the Dataset after
Discretization, as a result of the experiment, 901 of the 1000 instances are classified correctly
with the 90.9% accuracy, 60 of 501 fake accounts are classified as real and 31 of 499 real accounts
are classified as fake, Weighted average of the F-measure is increased to 0,909.
4. Critiques
4.1 Strong side
✓ Since the researcher uses more attributes from user items, it's critical to be able to spot fake
accounts quickly. since user objects contain account-wide details.
4.2 Weakness
✓ The data is collected manually. Therefor, error may be occurred.
✓ They use small sample data. if the collected data becomes more, the performance will be
increased.
4
References
al, Yazan Boshmaf et. (February, 2015). Íntegro: Leveraging Victim Prediction for Robust Fake
Account Detection in OSNs. 15, 8-11.
Buket Erúahin1, Özlem Aktaú1, , Deniz KÕlÕnç2, , Ceyhun Akyol2. (2017 ). Twitter Fake
Account Detection . IEEE .
Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N. Matthews, "Fake Twitter
accounts: Profile characteristics obtained using . (15, July 27 ). Fake Twitter accounts:
Profile characteristics obtained using an activity-based pattern detection approach,.
Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco,. (" May 14, 2014.). Detecting Subversion
of Twitter,.

More Related Content

Similar to Database Admin for Comp review seminar.pdf

DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...ijaia
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET Journal
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET Journal
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkIRJET Journal
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
 
Classification of Disastrous Tweets on Twitter using BERT Model
Classification of Disastrous Tweets on Twitter using BERT ModelClassification of Disastrous Tweets on Twitter using BERT Model
Classification of Disastrous Tweets on Twitter using BERT ModelIRJET Journal
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...IRJET Journal
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...IRJET Journal
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...IRJET Journal
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweetsijtsrd
 
IRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET Journal
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksIRJET Journal
 

Similar to Database Admin for Comp review seminar.pdf (20)

DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
 
Classification of Disastrous Tweets on Twitter using BERT Model
Classification of Disastrous Tweets on Twitter using BERT ModelClassification of Disastrous Tweets on Twitter using BERT Model
Classification of Disastrous Tweets on Twitter using BERT Model
 
6356152.pdf
6356152.pdf6356152.pdf
6356152.pdf
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
 
F017433947
F017433947F017433947
F017433947
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
 
757
757757
757
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweets
 
IRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET- Twitter Spammer Detection
IRJET- Twitter Spammer Detection
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social Networks
 

More from amare lakew

10+ Proven It Consultant Interview Questions [+Answers].pdf
10+ Proven It Consultant Interview Questions [+Answers].pdf10+ Proven It Consultant Interview Questions [+Answers].pdf
10+ Proven It Consultant Interview Questions [+Answers].pdfamare lakew
 
Maintain Inventories of Hardware and Software.pdf
Maintain Inventories of Hardware and Software.pdfMaintain Inventories of Hardware and Software.pdf
Maintain Inventories of Hardware and Software.pdfamare lakew
 
Bahir Dar Data Mining Lab-Weka Edited.pdf
Bahir Dar Data Mining Lab-Weka Edited.pdfBahir Dar Data Mining Lab-Weka Edited.pdf
Bahir Dar Data Mining Lab-Weka Edited.pdfamare lakew
 
Maintenance for university course out line.docx
Maintenance for university  course out line.docxMaintenance for university  course out line.docx
Maintenance for university course out line.docxamare lakew
 
Better titles and descriptions lead to more readers
Better titles and descriptions lead to more readersBetter titles and descriptions lead to more readers
Better titles and descriptions lead to more readersamare lakew
 
grade 8-social studies.pdf for horizon p
grade 8-social studies.pdf for horizon pgrade 8-social studies.pdf for horizon p
grade 8-social studies.pdf for horizon pamare lakew
 
maintenance for Untitled presentation.pptx
maintenance for Untitled presentation.pptxmaintenance for Untitled presentation.pptx
maintenance for Untitled presentation.pptxamare lakew
 
It Maintenance for TVT collage (Horaizone)
It Maintenance for TVT collage (Horaizone)It Maintenance for TVT collage (Horaizone)
It Maintenance for TVT collage (Horaizone)amare lakew
 
maintenance of the equipment how we can clean.
maintenance of the equipment how we can clean.maintenance of the equipment how we can clean.
maintenance of the equipment how we can clean.amare lakew
 
Computer networks--network
Computer networks--networkComputer networks--network
Computer networks--networkamare lakew
 
Distributedsystems 090709113230-phpapp02
Distributedsystems 090709113230-phpapp02Distributedsystems 090709113230-phpapp02
Distributedsystems 090709113230-phpapp02amare lakew
 

More from amare lakew (11)

10+ Proven It Consultant Interview Questions [+Answers].pdf
10+ Proven It Consultant Interview Questions [+Answers].pdf10+ Proven It Consultant Interview Questions [+Answers].pdf
10+ Proven It Consultant Interview Questions [+Answers].pdf
 
Maintain Inventories of Hardware and Software.pdf
Maintain Inventories of Hardware and Software.pdfMaintain Inventories of Hardware and Software.pdf
Maintain Inventories of Hardware and Software.pdf
 
Bahir Dar Data Mining Lab-Weka Edited.pdf
Bahir Dar Data Mining Lab-Weka Edited.pdfBahir Dar Data Mining Lab-Weka Edited.pdf
Bahir Dar Data Mining Lab-Weka Edited.pdf
 
Maintenance for university course out line.docx
Maintenance for university  course out line.docxMaintenance for university  course out line.docx
Maintenance for university course out line.docx
 
Better titles and descriptions lead to more readers
Better titles and descriptions lead to more readersBetter titles and descriptions lead to more readers
Better titles and descriptions lead to more readers
 
grade 8-social studies.pdf for horizon p
grade 8-social studies.pdf for horizon pgrade 8-social studies.pdf for horizon p
grade 8-social studies.pdf for horizon p
 
maintenance for Untitled presentation.pptx
maintenance for Untitled presentation.pptxmaintenance for Untitled presentation.pptx
maintenance for Untitled presentation.pptx
 
It Maintenance for TVT collage (Horaizone)
It Maintenance for TVT collage (Horaizone)It Maintenance for TVT collage (Horaizone)
It Maintenance for TVT collage (Horaizone)
 
maintenance of the equipment how we can clean.
maintenance of the equipment how we can clean.maintenance of the equipment how we can clean.
maintenance of the equipment how we can clean.
 
Computer networks--network
Computer networks--networkComputer networks--network
Computer networks--network
 
Distributedsystems 090709113230-phpapp02
Distributedsystems 090709113230-phpapp02Distributedsystems 090709113230-phpapp02
Distributedsystems 090709113230-phpapp02
 

Recently uploaded

AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfDiuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfKartik Tiwari
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Celine George
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfNirmal Dwivedi
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MysoreMuleSoftMeetup
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17Celine George
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 

Recently uploaded (20)

AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfDiuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

Database Admin for Comp review seminar.pdf

  • 1. BAHRDAR UNIVERSITY BAHIR DAR INSTITUTE OF TECHNOLOGY SCHOOL OF RESEARCH AND POST GRADUATE STUDIES FACULTY OF COMPUTING Department of Information Technology Gradute Seminar Title: Twitter Fake Account Detection Submitted by: Fentanesh Bezie BDU1300731 fantiebez@gmail.com Submited to: Mr.Arham D(Ass. Prof) MAY 11, 2021 BAHIR DAR, ETHIOPIA
  • 2. i Contents 1. Introduction.........................................................................................................................................1 2. Objective..............................................................................................................................................1 3. Methodology........................................................................................................................................2 3.1 Tools and Techniques .................................................................................................................2 3.1.1 Naïve Bayes algorithm........................................................................................................2 3.1.2 Entropy minimization discretization (EMD) technique ..................................................2 3.2 Dataset preparation ....................................................................................................................2 4. Critiques...............................................................................................................................................3 4.1 Strong side ...................................................................................................................................3 4.2 Weakness .....................................................................................................................................3 References.....................................................................................................................................................4
  • 3. ii Abstract Millions of people use social networking sites like Twitter and Facebook, and their interactions with these sites have influenced their lives. This prevalence in social networking has resulted in a number of issues, including the likelihood of false information being exposed to users through fake accounts, resulting in the spread of malicious material. This situation has the potential to cause significant harm to society in the real world. The researcher investigated and presented a tool for detecting fake Twitter accounts. They analyzed the results of the Nave Bayes algorithm after processing their dataset using a supervised discretization technique known as Entropy Minimization Discretization (EMD) on numerical features. Keywords- machine learning; social media; Twitter; spam detection; fake detection
  • 4. 1 1. Introduction For the past two decades, the social networking movement has exploded. Various forms of social networking have spawned a slew of online activities that have piqued the attention of a vast number of users. On the other hand, they are suffering from an increase in the number of fake accounts generated. The term "fake accounts" refers to accounts that do not belong to real people. Fake accounts can spread false information, deceptive web ratings, and spam. Users with fake accounts engage in prohibited behavior and break Twitter's laws. They may be automated account interactions or attempts to deceive or confuse people, such as posting harmful links, creating multiple accounts, posting frequently to the same subject or duplicate posts, posting links with unrelated tweets, and abusing the reply and mention features, among other things. Real accounts are those that follow Twitter's rules. Tweets may be sent as e-mail attachments or as SMS text messages. Twitter allows users to send and receive 140-character messages directly from their smartphones through a variety of Web-based services. Twitter disseminates knowledge to a vast number of real-time users. Millions of people use social networking sites like Twitter and Facebook, and their interactions with these sites have influenced their lives. This prevalence in social networking has resulted in a number of issues, including the likelihood of false information being exposed to users through fake accounts, resulting in the spread of malicious material. This situation can result to a huge damage in the real world to the society. Spammers are a major issue on social media since they can use their identities for a variety of purposes. Spreading rumors is one of these goals, which can have a significant impact on a specific company or even the whole community. The researcher detects false profile accounts from the Twitter online social network to prevent the dissemination of fake news, advertisements, and fake followers, based on the importance of social media's impact on society. 2. Objective The aim of this study is to identify fake Twitter profile accounts in order to prevent the dissemination of false information, advertising, and followers.
  • 5. 2 3. Methodology 3.1 Tools and Techniques The researcher uses Naïve Bayes algorithm and Entropy Minimization Discretization (EMD) techniques 3.1.1 Naïve Bayes algorithm It's used in supervised learning exercises, and it's quick and simple to grasp. It outperforms numerical variables when it comes to multi-class estimation and categorical input variables. Since Nave Bayes classifiers have a higher success rate in text classification, they are commonly used in spam filtering and sentiment analysis. The predictive attributes, in particular, are believed to be conditionally independent. Let C be a random variable that represents an instance's class, and X be a vector of random variables that represents the attribute values. Let c stand for a specific class name, and x for a specific attribute value. 3.1.2 Entropy minimization discretization (EMD) technique It is a supervised discretization technique. It evaluates different candidate cut points which are the midpoints of each pair in a sorted data. To evaluate the cut points, the data is divided into intervals and the class information entropy is calculated. The point with the minimum entropy among all candidates is selected. This process is done recursively always selecting the best cut point. A minimum description length (MDL) is applied to decide when to stop. They used this technique for their experiments because of its success. They are given a set of instances S, a feature A, and a partition boundary T, the class information entropy of the partition induced by T, E (A, T, S) is given by the Equation (2) (2) 3.2 Dataset preparation To make the experiments the researcher has created their own dataset by using Twitter API. Twitter allows to interact with its data such as tweets and several attributes about tweets using Twitter API. By means of a server-side scripting language requests can be made to Twitter API and results are in JSON format that can be read easily. There are four main objects in Twitter API. These are:
  • 6. 3 Tweets, Users, Entities and Places. Each of these objects have many attributes. They have selected 16 attributes for their Naive Bayes learning algorithm features. The researcher prepared dataset for their experiments and their data are collected manually by three individuals and the intersection of them, that means the common decisions, are selected and put in the dataset. Class decisions are made by examining username, background image, profile image, follower and friends count, description of the account, number of tweets, and content of the tweets. Totally, there are 501 fake and 499 real account data is collected. Evaluation metrics are Accuracy, F-Measure, and confusion matrix. First Experiment Applying the Naïve Bayes learning Algorithm on the Dataset Using All Attributes without Discretization, as a result of the first experiment, 861 of the 1000 instances are classified correctly with the 86.1% accuracy, 112 of 501 fake accounts are classified as real and 27 of 499 real accounts are classified as fake, Weighted average of the F-measure is 0,860. Second Experiment Applying the Naïve Bayes learning Algorithm on the Dataset after Discretization, as a result of the experiment, 901 of the 1000 instances are classified correctly with the 90.9% accuracy, 60 of 501 fake accounts are classified as real and 31 of 499 real accounts are classified as fake, Weighted average of the F-measure is increased to 0,909. 4. Critiques 4.1 Strong side ✓ Since the researcher uses more attributes from user items, it's critical to be able to spot fake accounts quickly. since user objects contain account-wide details. 4.2 Weakness ✓ The data is collected manually. Therefor, error may be occurred. ✓ They use small sample data. if the collected data becomes more, the performance will be increased.
  • 7. 4 References al, Yazan Boshmaf et. (February, 2015). Íntegro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs. 15, 8-11. Buket Erúahin1, Özlem Aktaú1, , Deniz KÕlÕnç2, , Ceyhun Akyol2. (2017 ). Twitter Fake Account Detection . IEEE . Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N. Matthews, "Fake Twitter accounts: Profile characteristics obtained using . (15, July 27 ). Fake Twitter accounts: Profile characteristics obtained using an activity-based pattern detection approach,. Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco,. (" May 14, 2014.). Detecting Subversion of Twitter,.