SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 728
Twitter Spam Detection using Cobweb
Vidhi Tiwari1, Akarsh Srivastava2, Mrs. P. Akilandeswari3
1SRMIST, Department of Computer Science and Engineering, Kattankulathur Campus
2SRMIST, Department of Computer Science and Engineering, Kattankulathur Campus
3Assistant Professor, Department of Computer Science and Engineering, Kattankulathur Campus, SRM Institute of
Science and Technology (formerly known as SRM University).
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - With the advent of online social networks, or OSN, there has been a significant rise of users daily, with twitter being
one of the largest social media networking sites in the world, for the world to have ever used. The network has been an ever-
growing web of users who may or may not have malicious intent towards society. With the increase in the number of people
using the OSN, there are more chances of exploitation. One of the major sources of exploitation is spam on the twitter feed. It is
known by survey, that 1 in every 21 tweets is spam. There are multiple ways a tweet could be used in harmful ways as well as
multiple ways a tweet can be made into a spam and hence, it is necessary to differentiate a tweet that is normal or spam. This
project aims to classify spam using a data stream clustering technique called cobweb and a result enhancer namely gradient
booster. Cobweb is an enhanced clustering algorithm that classifies data using trees. We aim to combine this to gradient
boosting, which is also a tree-based classifier to create a more enhanced spam detection system than there already are.
Key Words: Twitter; spam; cobweb; gradient boosting; URL; hashtag; phishing; mention; logarithmic function
1. INTRODUCTION
Ever since the dawn of the time of online social networks, OSNs (for example Twitter, Facebook, Instagram), there have
been new and upcoming users everyday. This growing network provides opportunities to an array of people using the
network since multiple accounts from a user are possible. While many users have been using this commendable
technology for greater use such as reaching out to help to employment opportunities, organizing fundraisers or reaching
out to help causes, it is often used for malicious motives which, unfortunately has a lot of channels. Twitter has been one of
the most leading social media networks since OSNs with a growth rate of the number of users being exponential. A survey
conducted in 2018 showed that there are a total of 231 million users on Twitter. Out of these about 200 million log onto
their accounts everyday. This gives a lot of ammunition to spammers. A statistical study showed that 1in every 21 tweets is
spam. With a number this high, we could be duped potentially every time we are logged in. Spammers use multiple
techniques to create spam tweets. The most popular method is the shortened URLs. Since the URLs’ destination isn’t
known, spammers use it to deceive users. Clicking on the link would lead you to get malware. Another method used is the
use of hashtags. Any word with the symbol of ‘’ in front of it can be used as a traffic generator. Tapping on it could lead to
potential spam. ‘Mentions’ work the same way as hashtags. They both generate traffic on and off the platform.
2. CONCEPTS IMPLEMENTED
2.1 Cobweb Clustering
Cobweb is a data stream clustering algorithm that was developed by Douglas H.Fisher. It is a system that works on the idea
of conceptual clustering of the input data. The algorithm observes the data and creates a classification tree. Each node
symbolizes a class and all the daughter nodes are expressed in the form of a probabilistic concept (PC) which is a probable
summarization of all attributes of the nodes which is used as a label for the parent node (class). There are a few operations
that cobweb executes to create the classification trees:
1) Merging of The Nodes: Merging the nodes requires replacement by a singular node whose children are the additive of
the children of the primary node set. The child node should summarize the attribute-value distributions of all objects
classified under them in the tree.
2) Splitting a given node: The node in reference is divided into two and replaced with its children nodes.
3) Inserting a new node: A node is created which represents the object that will be inserted into the tree.
4) Passing an object down the hierarchy of the tree: This step calls the Cobweb algorithm on the object and the subtree
rooted in the node.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 729
The basic algorithm for cobweb works by taking a node as an input for a record. If the base has no children, the node at
hand is set as the child node and the category of the child’s features is added to the tree. However, if the case has children,
the Category Utility is calculated for the given node and the best CU, that is, the smallest value leading class inherits the
given node as a child. This process is repeated for all entries of the dataset. The main reason why a classification tree is
used is to predict any missing attribute or recognize a new class of a new object and add it to the tree.
2.2 Gradient Boosting
Gradient boosting is one of the machine learning techniques for any set of classification and regression problems. It gives,
as result, a prediction model in the form of an assemblage of weaker prediction models, which is a method that uses
multiple weak algorithms to train into a better working model, typically decision trees. It works on building the model in a
stage-like fashion much like most of the boosting algorithms do, and it creates a more accurate system by allowing the
optimization of an arbitrary differentiable LS to give a generalized result. In many supervised algorithms used as learning
algorithms, the programmer has an output variable p and a vector of input variables namely q described using a joint
probability distribution P(p,q). Using a training set (p,q). . . (pn,qn) of already known values of p and corresponding values
of q, the goal is to find an approximate value to a function that gives a minimized value of the expected value of some
specified LF. The gradient boosting algorithm assumes a real-valued q and looks for an approximation for the value of p,
from the given set of values, in the form of a weighted sum of functions from some class, called base learners, also known
as weak learners.
3. STATE OF ART
Spam detection has had tremendous breakthroughs since the onset of everyday use of social media platforms, especially
on twitter.
In 2019, a novel stream clustering technique was designed which used incremental naive bayes classifier that was trained
to work efficiently on micro-clusters. The INBs capture the boundary and mean values of the microclusters, whereas the
Euclidean distance just utilizes the mean value of the clusters. This mostly gives skewed results for asymmetric big micro-
clusters. In this paper, DenStream was promoted to, by the proposed framework, to be called INB-DenStream. To show the
efficacy of INB-DenStream, common and known methods much like DenStream, CluStream and StreamKM++ were applied
to Twitter datasets and their performances were noted for values such as purity, overall precision, overall recall, F1
measure, parametric sensitivity, and computational complexity. The results of the comparison showed the superiority of
their method to the rivals in almost all the datasets.
In 2017, Mahdi Washha,, Aziz Qaroush, Manel Mezghani and Florence Sedes formulated the Hidden Markov Model (HMM)
to be a time dependent model for real-time filtering of spam tweets solely based on the topic of the tweets. This method is
only functional on the easily available and openly accessible meta-data in the tweet object to detect spam tweets exiting
from a stream of tweets related to a topic (e.g., #Trump), while considering the state of previously handled tweets
associated with the same topic. Compared to the robust and well known classical time-independent classification method,
the Random Forest algorithm, the experimental evaluation shows that the efficiency increases on the increase the quality
of the topics in terms of overall precision, recalls, and F measure performance metrics.
In 2018, Rutuja Katpatal, Aparna Junnarkar studied the various classifiers and performed and presented a comparative
study of them all for research purposes.
Anti Trust Rank algorithm is a known link-based spam detection algorithm which works on the principle that spam pages
are more likely to be referenced by other spam pages. It may function on any possible platform. Since a real world
connected web graph involves multiple billions of nodes, it is crucial to develop work efficient spam detection algorithms
for making the process easier. Thus, in 2018, Yeon Seong Jeong, Joyce Jiyoung Whang, Inderjit S. Dhillon, Seung Goo Kang
and Jungmin Lee developed the asynchronous Anti Trust Rank algorithm which allows us to reduce the number of
arithmetic operations required to detect spam as compared to the traditional synchronous Anti-TrustRank algorithm by a
large margin, without lowering the performance in detecting web spam.
Spam detecting systems mainly build their classification architecture which includes the binary classification of their data
and then it can be solved by a chosen machine learning algorithm that suits best to give the best possible results. The ML
algorithms, for example, Naïve Bayes classifier or support vector machine classifier then report the behavior of these
designed models. Hence, in 2016, Miss Shukla, Twinkle Kailas and Prof.D.B. Kshirsagar showed the effect of the data
related factors, namely the spam to non-spam ratio, the training data size, and data sampling, to the detection performance
of the system. The feature of the shown system was time varying spam tweet detection. The System showed that spam
detection was a huge challenge and it bridges the space between the performance evaluation. It mainly focused on the data
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 730
that was used for training and testing the system, the features and the model to identify the genuine, legitimate user and
report the spam user by providing the answer in a binary value, that is, 1 or 0.
4. INFERENCE FROM SURVEY
Inference from the survey was done for over multiple journal papers as well as conference papers for a clear
understanding of the subject. The basic idea of spam on twitter was gathered by papers written by Aparna Junnarkar,
Rutuja Katpatal, Aparna Junnarkar [3]; H. Tsukayama [11]; D. Song, K. Thomas,V. Paxson and C. Grier [13]; G. Stringhini,G.
Vigna and C. Kruegel [14]; Chao Chen [23]. The tested methods were understood and the results were analyzed. This gave
an average measure of values as results that we use to compare our results with. This provides a standard to our work,
given this is a system that is being built to provide an efficient alternate method to detect spam on twitter.
5. DATA SET ANALYSIS AND STUDY
With the use of machine learning algorithms: cobweb and gradient boosting, there is a need to train the algorithm before
we test the system for detecting spam. Hence, there are two sets of data sets that are used.
5.1 Training Data set
This system uses a training data set which is used to help increase the accuracy of the algorithm used. The accuracy of the
performance of the algorithm depends solely on the amount of training data used and the complexity of the data set, that
is, a good mix of data in the data set. Programmers divide the data set that is acquired into two uneven parts. The greater,
three-fourth is used for training while the smaller division is used for the testing. That is done when not a lot of data can be
gathered. This project allowed us to gather enough tweets to have multiple data sets.
Table -1: Training data set attributes
5.2 Testing Data Set
The testing data set is similar to the training data set but smaller in data entries as it is only used for testing the accuracy of
the system. In this case, it is missing an attribute for spam or not spam as that is the test, but at times the testing and
training data set is the same.
Table -2: Testing data set attributes
Attribute Data Type
1 Tweet ID Integer
2 Tweet Text
3 Followers Integer
Attribute Data Type
1 Tweet ID Integer
2 Tweet Text
3 Followers Integer
4 Following Integer
5 Is_retweet Binary
6 Location Text
7 Spam_or_notspam Text (Spam/Not Spam)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 731
4 Following Integer
5 Is_retweet Binary
6 Location Text
6. Methodology Used
With this project, we propose to attempt to improve the method with which spam on the social media platform, Twitter, is
detected.
We aim to implement cobweb, which is an enhanced conceptual clustering algorithm, that classifies data streams using
trees in combination with gradient boosting, which would be used to enhance the produced clustered structure fed into
the boosting technique to enhance the output accuracy.
This hybrid model works like a typical flow chart model where the cobweb technique would first create the tree by
creating classes and adding nodes to a class according to it’s Category Utility(CU). This is repeated for the length of the
data set and this tree is then fed into the gradient booster for enhancement of the classification. Gradient boosting creates
trees in a stage-wise manner using the weaker associations and trains the algorithm repeatedly to improve the
classification model.
7. Architecture Diagram
Fig -1: Architecture diagram representing the flow of the algorithm
8. MODULES
Module 1: Feature Extraction
Feature extraction is the method of reducing dimensions of the data at hand. This module will single out every part of the
tweet and label them separately. For example, all the characters after @ would be stored under a single username and
labeled. Some other features of the tweet include hashtags, links in the tweet, mentions, etc.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 732
Module 2: Matrix Creation
Matrix creation is the most crucial part of a big data working solution. This module creates a matrix out of the labeled data
from the previous module and feeds it into the main for prediction after the training of the algorithm is complete.
Module 3 : Training
The data from the complete dataset is divided into parts for training and testing. The bigger chunk of data is often used for
training the algorithm. Hence, this module of the project is concerned with training the algorithm or better functionality.
Module 4 : Testing
After the training of the algorithm, it is necessary to test the data on the remaining portion of the dataset that was saved
for testing. Hence, this module is designed for testing the trained algorithm to check if it gives the desired results.
Module 5 : Prediction
This module is concerned about prediction or final outcome of the entire project. It creates a classification report and lists
out the spam tweets and account IDs.
9. OUTPUTS AND RESULTS
The parameters used can vary greatly in the range of values that the algorithm can use. Hence it is important for them to
be reduced to a common range of values called scaling to be implemented throughout. The default range for Cobweb
clustering in 0.5. However it does not yield the correct results in all cases and needs to be changed accordingly.
Experimentally, different values of scaling were used against different numbers of inputs and the results were compiled
into a table, as given below in Fig-2.
Fig -2: Table with scaling values
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 733
The column heads contain the number of inputs during every iteration during the testing. First row heads contain the
scaling values used for every iteration.
The cells contain 2 values, the accuracy and time taken, in that order, for the corresponding values of scaling and number
of inputs.
For a particular number of inputs, the accuracy increases as the value of scaling increases. It stops increasing after scaling
settles down on a particular value. Similarly, time taken increases as the scaling value increases. Evidently, it is a
logarithmic function, the relation of scaling(s) and number of inputs(n). It is, therefore, optimal to use the value which
gives the maximum accuracy but takes the minimum time.
It is found to be:
(1)
In addition, the other parts of the results obtained can be divided into two sections namely graphic and declarative.
The graphic part of the output depicts the spam to not spam ratio, mostly depending on the amount of training given to the
data. This also brings statistical data to the output which can be used for comparison as shown in Fig-3.
Fig -3: Graphical Output
The plain declarative part of the output depicts the tweet id and in text, whether the tweet is spam or not, as depicted in
Figure 4. This is a rather tedious output to read due to the high amount of tweets in the data set.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 734
Fig -4: Declarative Output
10. CONCLUSION
Upon running the hybrid model for spam detection on the testing data, the desired output of tweet id and the declaration
of spam or not spam is achieved. The acquired value of f measure, which is dependent on the spam to non-spam ratio, is
found to be at par with the existing models. Hence, it is safe to conclude, this is an alternate yet successful hybrid model to
detect spam on twitter.
11. REFERENCES
[1] A Novel Stream Clustering Framework for Spam Detection in Twitter; Hadi Tajalizadeh and Reza Boostani; IEEE
TRANSACTIONS ONCOMPUTATIONAL SOCIAL SYSTEMS, VOL. 6, NO. 3, JUNE 2019
[2] A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering; Mahdi Washha, Aziz Qaroush, Manel
Mezghani, Florence Sedes; 21st International Conference on Knowledge Based and Intelligent Information and
Engineering Systems, KES2017, 6-8 September 2017, Marseille, France
[3] Spam Detection Techniques for Twitter; Rutuja Katpatal, Aparna Junnarkar; International Research Journal of
Engineering and Technology (IRJET), Volume 05, May 2018.
[4] Design of Machine Learning Approach For Spam Tweet Detection; Miss. Shukla Twinkle Kailas, Prof.D.B.Kshirsagar;
IJARIIE-ISSN (O), Volume 3, May 2018.
[5] Fast Asynchronous Anti-TrustRank forWeb Spam Detection; Joyce Jiyoung Whang, Yeon Seong Jeong,Inderjit S. Dhillon,
Seonggoo Kang, Jungmin Lee; MIS2, Marina Del Rey, CA, USA, Volume 02, Issue 5, April 2016.
[6] O. Kurasova, V. Marcinkevicius, V. Medvedev, A. Rapecka, and P. Stefanovic, “Strategies for big data clustering,” in Proc.
IEEE 26th Int. Conf. Tools Artif. Intell., Nov. 2014, pp. 740–747.
[7] Twitter spam detection; Shradha Hirve,Swarupa Kamble ; IJESC,49-57, April 2017.
[8] Constraine d NMF-base d semi-supervise d learning for social me dia spammer detection ;Dingguo Yu, Nan Chen, Frank
Jiang,Bin Fu, Aihong Qin; Elsevier-2018.
[9] Early filtering of ephemeral malicious accounts on Twitter; Sangho Lee, Jong Kim; Elsevier, Issue 48-57, 2015.
[10] Twitter Spam detection; Shradha Hirve,Swarupa Kamle; IJESC-2016.
[11] Twitter turns 7 : User sends over 400 million tweets everyday; H. Tsukayama; Washington Post(2013)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 735
[12] Twitter spammer detection using data stream clustering ; Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang,
April-2017.
[13] Suspended accounts in retrospect: An analysis of Twitter spam;K. Thomas, C. Grier, D. Song, and V. Paxson(2013).
[14] Detecting Spammers on social media; G. Stringhini, C. Kruegel, and G. Vigna(2015)
[15] An in-depth analysis of abuse on Twitter; J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang,
[16] Semi-supervised spam detection in Twitter stream; S. Sedhai and A. Sun; IEEE Trans. Comput. Social Syst., vol. 5, no. 1,
2018
[17] A performance evaluation of machine learning-based streaming spam tweets detection; C Chen el at.; IEEE Trans.
Comput. Social Syst., vol. 2, no. 3, 2015
[18] Twitter spam detection based on deep learning; T. Wu, S. Liu, J. Zhang, and Y. Xiang, Elsevier-2017
[19] Statistical Features-Based Real-Time Detection of Drifted Twitter Spam; Chao Chen, Yu Wang, Jun Zhang, Yang Xiang,
Wanlei Zhou,Geyong Min; IEEE Ttransaction on Information Forensics and Security, VOL. 12, NO. 4, APRIL 2017
[20] Detecting spammer on twitter; F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida; Proc. 7th Annu. Collaboration,
Electron. Messaging, Anti-Abuse Spam Conf, 2010
[21] What is twitter, a social network or a news media; H. Kwak, C. Lee, H. Park, and S. Moon; Proc. 19th Int. Conf. World
Wide Web, 2010
[22] A Survey of Spam Detection Methods on Twitter; Abdullah Talha Kabakus and Resul Kara, International Journal of
Advanced Computer Science and Applications, Vol. 8, No. 3, 2017
[23] Investigating the deceptive information in Twitter spam; Chao Chen et al. ; Future Generation Computer Systems
(2017)
[24] WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream; Sangho Lee and Jong Kim;
IEEE transactions on dependable and secure computing - 2016
[25] Reuters tracer: A large scale system of detecting & verifying real-time news events from twitter; Xiaomo Liu et al.;
25th ACM CIKM. 20
AUTHORS
Vidhi Tiwari is a fourth-year student currently
pursuing her bachelor’s degree in Computer
Science Engineering from SRM Institute of Science
and Technology. She has completed courses like
Web development in Php and Data Science in R.
Avid reader and learner, she is also interested in
writing.
Akarsh Srivastava is a fourth-year student
currently pursuing his bachelor’s degree in
COmputer Science from SRM Institute of Science
and Technology. An active participant in the
college club “Humanoid”, he has handled the
coding department of the same, leading him to
take part in hackathons. He has completed
courses such as Machine learning and is
interested in deep learning and neural nets.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 736
Mrs. P. Akhilandeswari holds the position of
Assistant Professor at SRM Institute of Science
and Technology in the department of Computer
Science till date. She has had over 15 years of
experience. Her interests in computing lie in IOT,
data science and big data, and cloud computing.

More Related Content

What's hot

Mcs 021
Mcs 021Mcs 021
Mcs 021
Ujjwal Kumar
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
ijsrd.com
 
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMWARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
ijnlc
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용
Kyunghoon Kim
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
Raf Guns
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
ijnlc
 
Sub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula NetworksSub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula Networks
ijceronline
 
Ppt
PptPpt
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
IRJET Journal
 
36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure
Washington Garcia
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
IRJET Journal
 
Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...
redpel dot com
 
757
757757
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
Graph-TA
 
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
IJCSEA Journal
 
Twitter text mining using sas
Twitter text mining using sasTwitter text mining using sas
Twitter text mining using sas
Analyst
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
IRJET Journal
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
SAS Text Mining
SAS Text MiningSAS Text Mining
SAS Text Mining
Mitchell Sanregret
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET Journal
 

What's hot (20)

Mcs 021
Mcs 021Mcs 021
Mcs 021
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
 
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMWARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
Sub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula NetworksSub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula Networks
 
Ppt
PptPpt
Ppt
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure36x48_new_modelling_cloud_infrastructure
36x48_new_modelling_cloud_infrastructure
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
 
Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...
 
757
757757
757
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
 
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
MMP-TREE FOR SEQUENTIAL PATTERN MINING WITH MULTIPLE MINIMUM SUPPORTS IN PROG...
 
Twitter text mining using sas
Twitter text mining using sasTwitter text mining using sas
Twitter text mining using sas
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
SAS Text Mining
SAS Text MiningSAS Text Mining
SAS Text Mining
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 

Similar to IRJET - Twitter Spam Detection using Cobweb

Analyzing Social media’s real data detection through Web content mining using...
Analyzing Social media’s real data detection through Web content mining using...Analyzing Social media’s real data detection through Web content mining using...
Analyzing Social media’s real data detection through Web content mining using...
IRJET Journal
 
Fake News Detection using Deep Learning
Fake News Detection using Deep LearningFake News Detection using Deep Learning
Fake News Detection using Deep Learning
NIET Journal of Engineering & Technology (NIETJET)
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
paperpublications3
 
IRJET - Fake News Detection: A Survey
IRJET -  	  Fake News Detection: A SurveyIRJET -  	  Fake News Detection: A Survey
IRJET - Fake News Detection: A Survey
IRJET Journal
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
IRJET Journal
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
IRJET Journal
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Editor IJAIEM
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
ijaia
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Pete Burnap
 
Fake News Detection using Passive Aggressive and Naïve Bayes
Fake News Detection using Passive Aggressive and Naïve BayesFake News Detection using Passive Aggressive and Naïve Bayes
Fake News Detection using Passive Aggressive and Naïve Bayes
IRJET Journal
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET Journal
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
IRJET Journal
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
IRJET Journal
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET Journal
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSS
IRJET Journal
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
IRJET Journal
 
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATIONMAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
IJNSA Journal
 
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATIONMAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
IJNSA Journal
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
IRJET Journal
 

Similar to IRJET - Twitter Spam Detection using Cobweb (20)

Analyzing Social media’s real data detection through Web content mining using...
Analyzing Social media’s real data detection through Web content mining using...Analyzing Social media’s real data detection through Web content mining using...
Analyzing Social media’s real data detection through Web content mining using...
 
Fake News Detection using Deep Learning
Fake News Detection using Deep LearningFake News Detection using Deep Learning
Fake News Detection using Deep Learning
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
IRJET - Fake News Detection: A Survey
IRJET -  	  Fake News Detection: A SurveyIRJET -  	  Fake News Detection: A Survey
IRJET - Fake News Detection: A Survey
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
Fake News Detection using Passive Aggressive and Naïve Bayes
Fake News Detection using Passive Aggressive and Naïve BayesFake News Detection using Passive Aggressive and Naïve Bayes
Fake News Detection using Passive Aggressive and Naïve Bayes
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social Web
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSS
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
 
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATIONMAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
 
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATIONMAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
MAPREDUCE IMPLEMENTATION FOR MALICIOUS WEBSITES CLASSIFICATION
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
riddhimaagrawal986
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 

Recently uploaded (20)

Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 

IRJET - Twitter Spam Detection using Cobweb

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 728 Twitter Spam Detection using Cobweb Vidhi Tiwari1, Akarsh Srivastava2, Mrs. P. Akilandeswari3 1SRMIST, Department of Computer Science and Engineering, Kattankulathur Campus 2SRMIST, Department of Computer Science and Engineering, Kattankulathur Campus 3Assistant Professor, Department of Computer Science and Engineering, Kattankulathur Campus, SRM Institute of Science and Technology (formerly known as SRM University). ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - With the advent of online social networks, or OSN, there has been a significant rise of users daily, with twitter being one of the largest social media networking sites in the world, for the world to have ever used. The network has been an ever- growing web of users who may or may not have malicious intent towards society. With the increase in the number of people using the OSN, there are more chances of exploitation. One of the major sources of exploitation is spam on the twitter feed. It is known by survey, that 1 in every 21 tweets is spam. There are multiple ways a tweet could be used in harmful ways as well as multiple ways a tweet can be made into a spam and hence, it is necessary to differentiate a tweet that is normal or spam. This project aims to classify spam using a data stream clustering technique called cobweb and a result enhancer namely gradient booster. Cobweb is an enhanced clustering algorithm that classifies data using trees. We aim to combine this to gradient boosting, which is also a tree-based classifier to create a more enhanced spam detection system than there already are. Key Words: Twitter; spam; cobweb; gradient boosting; URL; hashtag; phishing; mention; logarithmic function 1. INTRODUCTION Ever since the dawn of the time of online social networks, OSNs (for example Twitter, Facebook, Instagram), there have been new and upcoming users everyday. This growing network provides opportunities to an array of people using the network since multiple accounts from a user are possible. While many users have been using this commendable technology for greater use such as reaching out to help to employment opportunities, organizing fundraisers or reaching out to help causes, it is often used for malicious motives which, unfortunately has a lot of channels. Twitter has been one of the most leading social media networks since OSNs with a growth rate of the number of users being exponential. A survey conducted in 2018 showed that there are a total of 231 million users on Twitter. Out of these about 200 million log onto their accounts everyday. This gives a lot of ammunition to spammers. A statistical study showed that 1in every 21 tweets is spam. With a number this high, we could be duped potentially every time we are logged in. Spammers use multiple techniques to create spam tweets. The most popular method is the shortened URLs. Since the URLs’ destination isn’t known, spammers use it to deceive users. Clicking on the link would lead you to get malware. Another method used is the use of hashtags. Any word with the symbol of ‘’ in front of it can be used as a traffic generator. Tapping on it could lead to potential spam. ‘Mentions’ work the same way as hashtags. They both generate traffic on and off the platform. 2. CONCEPTS IMPLEMENTED 2.1 Cobweb Clustering Cobweb is a data stream clustering algorithm that was developed by Douglas H.Fisher. It is a system that works on the idea of conceptual clustering of the input data. The algorithm observes the data and creates a classification tree. Each node symbolizes a class and all the daughter nodes are expressed in the form of a probabilistic concept (PC) which is a probable summarization of all attributes of the nodes which is used as a label for the parent node (class). There are a few operations that cobweb executes to create the classification trees: 1) Merging of The Nodes: Merging the nodes requires replacement by a singular node whose children are the additive of the children of the primary node set. The child node should summarize the attribute-value distributions of all objects classified under them in the tree. 2) Splitting a given node: The node in reference is divided into two and replaced with its children nodes. 3) Inserting a new node: A node is created which represents the object that will be inserted into the tree. 4) Passing an object down the hierarchy of the tree: This step calls the Cobweb algorithm on the object and the subtree rooted in the node.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 729 The basic algorithm for cobweb works by taking a node as an input for a record. If the base has no children, the node at hand is set as the child node and the category of the child’s features is added to the tree. However, if the case has children, the Category Utility is calculated for the given node and the best CU, that is, the smallest value leading class inherits the given node as a child. This process is repeated for all entries of the dataset. The main reason why a classification tree is used is to predict any missing attribute or recognize a new class of a new object and add it to the tree. 2.2 Gradient Boosting Gradient boosting is one of the machine learning techniques for any set of classification and regression problems. It gives, as result, a prediction model in the form of an assemblage of weaker prediction models, which is a method that uses multiple weak algorithms to train into a better working model, typically decision trees. It works on building the model in a stage-like fashion much like most of the boosting algorithms do, and it creates a more accurate system by allowing the optimization of an arbitrary differentiable LS to give a generalized result. In many supervised algorithms used as learning algorithms, the programmer has an output variable p and a vector of input variables namely q described using a joint probability distribution P(p,q). Using a training set (p,q). . . (pn,qn) of already known values of p and corresponding values of q, the goal is to find an approximate value to a function that gives a minimized value of the expected value of some specified LF. The gradient boosting algorithm assumes a real-valued q and looks for an approximation for the value of p, from the given set of values, in the form of a weighted sum of functions from some class, called base learners, also known as weak learners. 3. STATE OF ART Spam detection has had tremendous breakthroughs since the onset of everyday use of social media platforms, especially on twitter. In 2019, a novel stream clustering technique was designed which used incremental naive bayes classifier that was trained to work efficiently on micro-clusters. The INBs capture the boundary and mean values of the microclusters, whereas the Euclidean distance just utilizes the mean value of the clusters. This mostly gives skewed results for asymmetric big micro- clusters. In this paper, DenStream was promoted to, by the proposed framework, to be called INB-DenStream. To show the efficacy of INB-DenStream, common and known methods much like DenStream, CluStream and StreamKM++ were applied to Twitter datasets and their performances were noted for values such as purity, overall precision, overall recall, F1 measure, parametric sensitivity, and computational complexity. The results of the comparison showed the superiority of their method to the rivals in almost all the datasets. In 2017, Mahdi Washha,, Aziz Qaroush, Manel Mezghani and Florence Sedes formulated the Hidden Markov Model (HMM) to be a time dependent model for real-time filtering of spam tweets solely based on the topic of the tweets. This method is only functional on the easily available and openly accessible meta-data in the tweet object to detect spam tweets exiting from a stream of tweets related to a topic (e.g., #Trump), while considering the state of previously handled tweets associated with the same topic. Compared to the robust and well known classical time-independent classification method, the Random Forest algorithm, the experimental evaluation shows that the efficiency increases on the increase the quality of the topics in terms of overall precision, recalls, and F measure performance metrics. In 2018, Rutuja Katpatal, Aparna Junnarkar studied the various classifiers and performed and presented a comparative study of them all for research purposes. Anti Trust Rank algorithm is a known link-based spam detection algorithm which works on the principle that spam pages are more likely to be referenced by other spam pages. It may function on any possible platform. Since a real world connected web graph involves multiple billions of nodes, it is crucial to develop work efficient spam detection algorithms for making the process easier. Thus, in 2018, Yeon Seong Jeong, Joyce Jiyoung Whang, Inderjit S. Dhillon, Seung Goo Kang and Jungmin Lee developed the asynchronous Anti Trust Rank algorithm which allows us to reduce the number of arithmetic operations required to detect spam as compared to the traditional synchronous Anti-TrustRank algorithm by a large margin, without lowering the performance in detecting web spam. Spam detecting systems mainly build their classification architecture which includes the binary classification of their data and then it can be solved by a chosen machine learning algorithm that suits best to give the best possible results. The ML algorithms, for example, Naïve Bayes classifier or support vector machine classifier then report the behavior of these designed models. Hence, in 2016, Miss Shukla, Twinkle Kailas and Prof.D.B. Kshirsagar showed the effect of the data related factors, namely the spam to non-spam ratio, the training data size, and data sampling, to the detection performance of the system. The feature of the shown system was time varying spam tweet detection. The System showed that spam detection was a huge challenge and it bridges the space between the performance evaluation. It mainly focused on the data
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 730 that was used for training and testing the system, the features and the model to identify the genuine, legitimate user and report the spam user by providing the answer in a binary value, that is, 1 or 0. 4. INFERENCE FROM SURVEY Inference from the survey was done for over multiple journal papers as well as conference papers for a clear understanding of the subject. The basic idea of spam on twitter was gathered by papers written by Aparna Junnarkar, Rutuja Katpatal, Aparna Junnarkar [3]; H. Tsukayama [11]; D. Song, K. Thomas,V. Paxson and C. Grier [13]; G. Stringhini,G. Vigna and C. Kruegel [14]; Chao Chen [23]. The tested methods were understood and the results were analyzed. This gave an average measure of values as results that we use to compare our results with. This provides a standard to our work, given this is a system that is being built to provide an efficient alternate method to detect spam on twitter. 5. DATA SET ANALYSIS AND STUDY With the use of machine learning algorithms: cobweb and gradient boosting, there is a need to train the algorithm before we test the system for detecting spam. Hence, there are two sets of data sets that are used. 5.1 Training Data set This system uses a training data set which is used to help increase the accuracy of the algorithm used. The accuracy of the performance of the algorithm depends solely on the amount of training data used and the complexity of the data set, that is, a good mix of data in the data set. Programmers divide the data set that is acquired into two uneven parts. The greater, three-fourth is used for training while the smaller division is used for the testing. That is done when not a lot of data can be gathered. This project allowed us to gather enough tweets to have multiple data sets. Table -1: Training data set attributes 5.2 Testing Data Set The testing data set is similar to the training data set but smaller in data entries as it is only used for testing the accuracy of the system. In this case, it is missing an attribute for spam or not spam as that is the test, but at times the testing and training data set is the same. Table -2: Testing data set attributes Attribute Data Type 1 Tweet ID Integer 2 Tweet Text 3 Followers Integer Attribute Data Type 1 Tweet ID Integer 2 Tweet Text 3 Followers Integer 4 Following Integer 5 Is_retweet Binary 6 Location Text 7 Spam_or_notspam Text (Spam/Not Spam)
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 731 4 Following Integer 5 Is_retweet Binary 6 Location Text 6. Methodology Used With this project, we propose to attempt to improve the method with which spam on the social media platform, Twitter, is detected. We aim to implement cobweb, which is an enhanced conceptual clustering algorithm, that classifies data streams using trees in combination with gradient boosting, which would be used to enhance the produced clustered structure fed into the boosting technique to enhance the output accuracy. This hybrid model works like a typical flow chart model where the cobweb technique would first create the tree by creating classes and adding nodes to a class according to it’s Category Utility(CU). This is repeated for the length of the data set and this tree is then fed into the gradient booster for enhancement of the classification. Gradient boosting creates trees in a stage-wise manner using the weaker associations and trains the algorithm repeatedly to improve the classification model. 7. Architecture Diagram Fig -1: Architecture diagram representing the flow of the algorithm 8. MODULES Module 1: Feature Extraction Feature extraction is the method of reducing dimensions of the data at hand. This module will single out every part of the tweet and label them separately. For example, all the characters after @ would be stored under a single username and labeled. Some other features of the tweet include hashtags, links in the tweet, mentions, etc.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 732 Module 2: Matrix Creation Matrix creation is the most crucial part of a big data working solution. This module creates a matrix out of the labeled data from the previous module and feeds it into the main for prediction after the training of the algorithm is complete. Module 3 : Training The data from the complete dataset is divided into parts for training and testing. The bigger chunk of data is often used for training the algorithm. Hence, this module of the project is concerned with training the algorithm or better functionality. Module 4 : Testing After the training of the algorithm, it is necessary to test the data on the remaining portion of the dataset that was saved for testing. Hence, this module is designed for testing the trained algorithm to check if it gives the desired results. Module 5 : Prediction This module is concerned about prediction or final outcome of the entire project. It creates a classification report and lists out the spam tweets and account IDs. 9. OUTPUTS AND RESULTS The parameters used can vary greatly in the range of values that the algorithm can use. Hence it is important for them to be reduced to a common range of values called scaling to be implemented throughout. The default range for Cobweb clustering in 0.5. However it does not yield the correct results in all cases and needs to be changed accordingly. Experimentally, different values of scaling were used against different numbers of inputs and the results were compiled into a table, as given below in Fig-2. Fig -2: Table with scaling values
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 733 The column heads contain the number of inputs during every iteration during the testing. First row heads contain the scaling values used for every iteration. The cells contain 2 values, the accuracy and time taken, in that order, for the corresponding values of scaling and number of inputs. For a particular number of inputs, the accuracy increases as the value of scaling increases. It stops increasing after scaling settles down on a particular value. Similarly, time taken increases as the scaling value increases. Evidently, it is a logarithmic function, the relation of scaling(s) and number of inputs(n). It is, therefore, optimal to use the value which gives the maximum accuracy but takes the minimum time. It is found to be: (1) In addition, the other parts of the results obtained can be divided into two sections namely graphic and declarative. The graphic part of the output depicts the spam to not spam ratio, mostly depending on the amount of training given to the data. This also brings statistical data to the output which can be used for comparison as shown in Fig-3. Fig -3: Graphical Output The plain declarative part of the output depicts the tweet id and in text, whether the tweet is spam or not, as depicted in Figure 4. This is a rather tedious output to read due to the high amount of tweets in the data set.
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 734 Fig -4: Declarative Output 10. CONCLUSION Upon running the hybrid model for spam detection on the testing data, the desired output of tweet id and the declaration of spam or not spam is achieved. The acquired value of f measure, which is dependent on the spam to non-spam ratio, is found to be at par with the existing models. Hence, it is safe to conclude, this is an alternate yet successful hybrid model to detect spam on twitter. 11. REFERENCES [1] A Novel Stream Clustering Framework for Spam Detection in Twitter; Hadi Tajalizadeh and Reza Boostani; IEEE TRANSACTIONS ONCOMPUTATIONAL SOCIAL SYSTEMS, VOL. 6, NO. 3, JUNE 2019 [2] A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering; Mahdi Washha, Aziz Qaroush, Manel Mezghani, Florence Sedes; 21st International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2017, 6-8 September 2017, Marseille, France [3] Spam Detection Techniques for Twitter; Rutuja Katpatal, Aparna Junnarkar; International Research Journal of Engineering and Technology (IRJET), Volume 05, May 2018. [4] Design of Machine Learning Approach For Spam Tweet Detection; Miss. Shukla Twinkle Kailas, Prof.D.B.Kshirsagar; IJARIIE-ISSN (O), Volume 3, May 2018. [5] Fast Asynchronous Anti-TrustRank forWeb Spam Detection; Joyce Jiyoung Whang, Yeon Seong Jeong,Inderjit S. Dhillon, Seonggoo Kang, Jungmin Lee; MIS2, Marina Del Rey, CA, USA, Volume 02, Issue 5, April 2016. [6] O. Kurasova, V. Marcinkevicius, V. Medvedev, A. Rapecka, and P. Stefanovic, “Strategies for big data clustering,” in Proc. IEEE 26th Int. Conf. Tools Artif. Intell., Nov. 2014, pp. 740–747. [7] Twitter spam detection; Shradha Hirve,Swarupa Kamble ; IJESC,49-57, April 2017. [8] Constraine d NMF-base d semi-supervise d learning for social me dia spammer detection ;Dingguo Yu, Nan Chen, Frank Jiang,Bin Fu, Aihong Qin; Elsevier-2018. [9] Early filtering of ephemeral malicious accounts on Twitter; Sangho Lee, Jong Kim; Elsevier, Issue 48-57, 2015. [10] Twitter Spam detection; Shradha Hirve,Swarupa Kamle; IJESC-2016. [11] Twitter turns 7 : User sends over 400 million tweets everyday; H. Tsukayama; Washington Post(2013)
  • 8. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 735 [12] Twitter spammer detection using data stream clustering ; Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, April-2017. [13] Suspended accounts in retrospect: An analysis of Twitter spam;K. Thomas, C. Grier, D. Song, and V. Paxson(2013). [14] Detecting Spammers on social media; G. Stringhini, C. Kruegel, and G. Vigna(2015) [15] An in-depth analysis of abuse on Twitter; J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang, [16] Semi-supervised spam detection in Twitter stream; S. Sedhai and A. Sun; IEEE Trans. Comput. Social Syst., vol. 5, no. 1, 2018 [17] A performance evaluation of machine learning-based streaming spam tweets detection; C Chen el at.; IEEE Trans. Comput. Social Syst., vol. 2, no. 3, 2015 [18] Twitter spam detection based on deep learning; T. Wu, S. Liu, J. Zhang, and Y. Xiang, Elsevier-2017 [19] Statistical Features-Based Real-Time Detection of Drifted Twitter Spam; Chao Chen, Yu Wang, Jun Zhang, Yang Xiang, Wanlei Zhou,Geyong Min; IEEE Ttransaction on Information Forensics and Security, VOL. 12, NO. 4, APRIL 2017 [20] Detecting spammer on twitter; F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida; Proc. 7th Annu. Collaboration, Electron. Messaging, Anti-Abuse Spam Conf, 2010 [21] What is twitter, a social network or a news media; H. Kwak, C. Lee, H. Park, and S. Moon; Proc. 19th Int. Conf. World Wide Web, 2010 [22] A Survey of Spam Detection Methods on Twitter; Abdullah Talha Kabakus and Resul Kara, International Journal of Advanced Computer Science and Applications, Vol. 8, No. 3, 2017 [23] Investigating the deceptive information in Twitter spam; Chao Chen et al. ; Future Generation Computer Systems (2017) [24] WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream; Sangho Lee and Jong Kim; IEEE transactions on dependable and secure computing - 2016 [25] Reuters tracer: A large scale system of detecting & verifying real-time news events from twitter; Xiaomo Liu et al.; 25th ACM CIKM. 20 AUTHORS Vidhi Tiwari is a fourth-year student currently pursuing her bachelor’s degree in Computer Science Engineering from SRM Institute of Science and Technology. She has completed courses like Web development in Php and Data Science in R. Avid reader and learner, she is also interested in writing. Akarsh Srivastava is a fourth-year student currently pursuing his bachelor’s degree in COmputer Science from SRM Institute of Science and Technology. An active participant in the college club “Humanoid”, he has handled the coding department of the same, leading him to take part in hackathons. He has completed courses such as Machine learning and is interested in deep learning and neural nets.
  • 9. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 736 Mrs. P. Akhilandeswari holds the position of Assistant Professor at SRM Institute of Science and Technology in the department of Computer Science till date. She has had over 15 years of experience. Her interests in computing lie in IOT, data science and big data, and cloud computing.