Social network has become so popular with overwhelming high rate of growth, due to this popularity the online social networks is facing the issues of spamming, which has leads to unsubstantial economic loss to this menace of spam and spammers activities. It has leads to uncontrollable dissemination of viruses and malwares, promotional ads, phishing, and scams. spam activities has enter a new dangerous dimension, the spammers have step up their games and tactics online social networks, it consumes large amounts of network bandwidth leading to less revenue and significant economic loss to both private and public sectors. From the previous scholars work on spammer classification taxonomy, various machine learning techniques have been extensively used to detect spam activities and spammers in online social networks. There are various classifier that are learn over content-based features extracted from the user's interactions and profiles to label them as spam/spammers or legitimate. But recently, new network structural bench mark features have been proposed for spammer detection task, but their importance using structural bench mark learning methods has not been extensively evaluated yet. In this research work, we evaluate the the metric performance of some structural bench mark learning methods using scientific and strategic approach based attributes extracted from an interaction network for the task of spammer detection in online social network.
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...ijtsrd
Online Social Networks OSNs are providing a diversity of application for human users to network through families, friends and even strangers. One of such application, friend search engine, allows the universal public to inquiry individual client friend lists and has been gaining popularity recently. Proper design, this application may incorrectly disclose client private relationship information. Existing work has a privacy perpetuation clarification that can effectively boost OSNs' sociability while protecting users' friendship privacy against attacks launched by individual malicious requestors. In this project proposed an advanced collusion attack, where a victim user's friendship privacy can be compromise from side to side a series of cautiously designed queries coordinately launched by multiple malicious requestors. The result of the proposed collusion attack is validate through synthetic and real world social network data sets. The project on the advanced collusion attacks will help us design a more vigorous and securer friend search engine on OSNs in the near future. R. Brintha | H. Parveen Bagum "Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Search Engine" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31687.pdf Paper Url :https://www.ijtsrd.com/computer-science/world-wide-web/31687/retrieving-hidden-friends-a-collusion-privacy-attack-against-online-friend-search-engine/r-brintha
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
Social networks can offer many services to the users for sharing activities events and their ideas. Many attacks can happened to the social networking websites due to trust that have been given by the users. Cyber threats are discussed in this paper. We study the types of cyber threats, classify them and give some suggestions to protect social networking websites of variety of attacks. Moreover, we gave some antithreats strategies with future trends.
Authorization mechanism for multiparty data sharing in social networkeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Benchmarking the Privacy-Preserving People SearchDaqing He
This document discusses benchmarking privacy-preserving people search. Social match is important in people search because tighter social similarity makes it easier to connect people. However, privacy is a big concern with people search using social networks due to users opting out of sharing information or providing incomplete data. The research aims to obtain a privacy-preserving social network by simulating with public coauthor networks. Both global and local network features are important for people search performance, and accounting for privacy concerns in these features can significantly impact search results. The local network feature, relating to direct connections, has more influence than global features relating to whole network propagation.
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
Review of trends related to social network analysis in the enterprise. Presented at the 2010 Catalyst Conference in San Diego, CA july 29, 2010. Presented with Mike Gotta, Gartner Group.
The document discusses a study that examines the correlation between actor centrality in social networks and their ability to coordinate projects. It outlines the research framework, which involves extracting coordination-related phrases from emails, calculating coordination scores bounded by project scopes, constructing social network matrices using centrality measures, and testing the association between centrality and coordination. Preliminary results on the Enron email network from 1997-2002 are presented. The methodology involves text mining the Enron dataset to calculate coordination scores and social network centrality metrics like degree, closeness, and betweenness centrality.
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...ijtsrd
Online Social Networks OSNs are providing a diversity of application for human users to network through families, friends and even strangers. One of such application, friend search engine, allows the universal public to inquiry individual client friend lists and has been gaining popularity recently. Proper design, this application may incorrectly disclose client private relationship information. Existing work has a privacy perpetuation clarification that can effectively boost OSNs' sociability while protecting users' friendship privacy against attacks launched by individual malicious requestors. In this project proposed an advanced collusion attack, where a victim user's friendship privacy can be compromise from side to side a series of cautiously designed queries coordinately launched by multiple malicious requestors. The result of the proposed collusion attack is validate through synthetic and real world social network data sets. The project on the advanced collusion attacks will help us design a more vigorous and securer friend search engine on OSNs in the near future. R. Brintha | H. Parveen Bagum "Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Search Engine" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31687.pdf Paper Url :https://www.ijtsrd.com/computer-science/world-wide-web/31687/retrieving-hidden-friends-a-collusion-privacy-attack-against-online-friend-search-engine/r-brintha
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
Social networks can offer many services to the users for sharing activities events and their ideas. Many attacks can happened to the social networking websites due to trust that have been given by the users. Cyber threats are discussed in this paper. We study the types of cyber threats, classify them and give some suggestions to protect social networking websites of variety of attacks. Moreover, we gave some antithreats strategies with future trends.
Authorization mechanism for multiparty data sharing in social networkeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Benchmarking the Privacy-Preserving People SearchDaqing He
This document discusses benchmarking privacy-preserving people search. Social match is important in people search because tighter social similarity makes it easier to connect people. However, privacy is a big concern with people search using social networks due to users opting out of sharing information or providing incomplete data. The research aims to obtain a privacy-preserving social network by simulating with public coauthor networks. Both global and local network features are important for people search performance, and accounting for privacy concerns in these features can significantly impact search results. The local network feature, relating to direct connections, has more influence than global features relating to whole network propagation.
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
Review of trends related to social network analysis in the enterprise. Presented at the 2010 Catalyst Conference in San Diego, CA july 29, 2010. Presented with Mike Gotta, Gartner Group.
The document discusses a study that examines the correlation between actor centrality in social networks and their ability to coordinate projects. It outlines the research framework, which involves extracting coordination-related phrases from emails, calculating coordination scores bounded by project scopes, constructing social network matrices using centrality measures, and testing the association between centrality and coordination. Preliminary results on the Enron email network from 1997-2002 are presented. The methodology involves text mining the Enron dataset to calculate coordination scores and social network centrality metrics like degree, closeness, and betweenness centrality.
Integrated approach to detect spam in social media networks using hybrid feat...IJECEIAES
Online social networking sites are becoming more popular amongst Internet users. The Internet users spend some amount of time on popular social networking sites like Facebook, Twitter and LinkedIn etc. Online social networks are considered to be much useful tool to the society used by Internet lovers to communicate and transmit information. These social networking platforms are useful to share information, opinions and ideas, make new friends, and create new friend groups. Social networking sites provide large amount of technical information to the users. This large amount of information in social networking sites attracts cyber criminals to misuse these sites information. These users create their own accounts and spread vulnerable information to the genuine users. This information may be advertising some product, send some malicious links etc to disturb the natural users on social sites. Spammer detection is a major problem now days in social networking sites. Previous spam detection techniques use different set of features to classify spam and non spam users. In this paper we proposed a hybrid approach which uses content based and user based features for identification of spam on Twitter network. In this hybrid approach we used decision tree induction algorithm and Bayesian network algorithm to construct a classification model. We have analysed the proposed technique on twitter dataset. Our analysis shows that our proposed methodology is better than some other existing techniques.
The advancement of Information Technology has hastened the ability to disseminate information across the globe. In particular, the recent trends in ‘Social Networking’ have led to a spark in personally sensitive information being published on the World Wide Web. While such socially active websites are creative tools for expressing one’s personality it also entails serious privacy concerns. Thus, Social Networking websites could be termed a double edged sword. It is important for the law to keep abreast of these developments in technology. The purpose of this paper is to demonstrate the limits of extending existing laws to battle privacy intrusions in the Internet especially in the context of social networking. It is suggested that privacy specific legislation is the most appropriate means of protecting online privacy. In doing so it is important to maintain a balance between the competing right of expression, the failure of which may hinder the reaping of benefits offered by Internet technology
Presentation-Detecting Spammers on Social NetworksAshish Arora
This document summarizes research on detecting spammers on social networks. The researchers created fake "honeypot" profiles on Facebook, Twitter, and MySpace to collect data from spammers over 12 months. They analyzed the data to identify patterns in spam bots and campaigns. Machine learning techniques were used to develop features that detected spammers with 2-3% error rates. The techniques help social networks improve security and identify malicious users spreading spam.
This document discusses using trust and provenance for content filtering on the semantic web. It describes how trust can be inferred between individuals in social networks and how that trust information can be used to filter and recommend content. Applications mentioned include using trust ratings to help intelligence analysts sort through information by identifying reliable sources.
Detecting fake news_with_weak_social_supervisionSuresh S
This document discusses using weak social supervision for detecting fake news on social media. It defines weak supervision and describes how social media data provides a new type of weak supervision called weak social supervision. Weak social supervision can be generated from three aspects of social media - users, posts, and networks. Recent works have shown that user-based, post-based, and network-based weak social supervision can be effective for fake news detection, even with limited labeled data. Weak social supervision is a promising approach for learning tasks where labeled data is scarce.
This document discusses data mining in social networks. It covers topics like social network analysis, graph mining, and text mining on social media platforms. Graph mining is used to understand relationships and extract communities from social networks. Text mining techniques like clustering and anomaly detection are applied to textual data from blogs, messages, etc. on social platforms. The document also discusses accessing Facebook data through its API and SDK, and applications and limitations of social network analysis.
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
1) The document presents a machine learning approach to detect phishing URLs using logistic regression. It trains a logistic regression model on a dataset of 420,467 URLs that have been classified as either phishing or legitimate.
2) It preprocesses the URLs using tokenization before training the logistic regression model. The trained model is able to classify new URLs with 96% accuracy as either phishing or legitimate based on the URL features.
3) The proposed approach provides an automated way to detect phishing URLs in real-time and help prevent phishing attacks. Future work could involve developing a browser extension using this approach and increasing the dataset size for higher accuracy.
This document provides an overview of social network analysis from online behavior and data mining techniques. It discusses classical social network analysis approaches and frameworks. Various types of social network analysis are described, including whole network analysis and personal network analysis. Models and measures used in social network analysis are also outlined.
This document discusses data mining techniques for social media. It begins by reviewing the growth of popular social media sites like Facebook, YouTube, and Twitter. It then discusses how social media generates huge amounts of user data through interactions and content sharing. The document outlines opportunities to use data mining on social networks to gain insights into human behavior, marketing analytics, and more. It reviews common problems studied, like community detection, node classification, and modeling information flow. The conclusion emphasizes that social media provides a massive, open dataset for developing recommender systems and targeting marketing through predictive analysis of user interests and trends.
Identification of inference attacks on private Information from Social Networkseditorjournal
Online social networks, like
Facebook, twitter are increasingly utilized by
many people. These networks permit users to
publish details about them and to connect to
their friends. Some of the details revealed
inside these networks are meant to be
keeping private. Yet it is possible to use
learning algorithms and methods on released
data have to predict private information,
which cause inference attacks. This paper
discovers how to launch inference attacks
using released social networking details to
predict private information’s. It then
separate three possible sanitization
algorithms that could be used in various
situations. Then, it investigates the
effectiveness of these techniques and tries to
use methods of collective inference
techniques to determine sensitive attributes
of the user data set. It shows that it can
decline the effectiveness of both the local and
relational classification algorithms by using
the sanitization methods we described.
Dear Students
Ingenious techno Solution offers an expertise guidance on you Final Year IEEE & Non- IEEE Projects on the following domain
JAVA
.NET
EMBEDDED SYSTEMS
ROBOTICS
MECHANICAL
MATLAB etc
For further details contact us:
enquiry@ingenioustech.in
044-42046028 or 8428302179.
Ingenious Techno Solution
#241/85, 4th floor
Rangarajapuram main road,
Kodambakkam (Power House)
http://www.ingenioustech.in/
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...ijsptm
Ensuring privacy of users of social networks is probably an unsolvable conundrum. At the same time, an informed use of the existing privacy options by the social network participants may alleviate - or even prevent - some of the more drastic privacy-averse incidents. Unfortunately, recent surveys show that an average user is either not aware of these options or does not use them, probably due to their perceived complexity. It is therefore reasonable to believe that tools assisting users with two tasks: 1) understanding their social net behaviour in terms of their privacy settings and broad privacy categories, and 2)recommending reasonable privacy options, will be a valuable tool for everyday privacy practice in a social
network context. This paper presents YourPrivacyProtector, a recommender system that shows how simple machine learning techniques may provide useful assistance in these two tasks to Facebook users. We support our claim with empirical results of application of YourPrivacyProtector to two groups of Facebook
users.
This document discusses data mining techniques for social media. It explains that graph mining is used to cluster similar data together based on relationships and connections between nodes in a graph. Graph mining on Facebook can be used to search for friends, places, and interests while ranking results by strength of connections. Text mining extracts meaningful information from unstructured text data on social networks and can automatically process emails, classify texts, and potentially extract full information from websites.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Filter unwanted messages from walls and blocking non legitimate users in osnIAEME Publication
1. The document presents a system to filter unwanted messages from user walls in online social networks. It aims to give users more control over the content that appears on their walls.
2. A machine learning classifier is used to automatically label messages by category. Users can then specify filtering rules to block certain categories or keywords from appearing.
3. The system also implements a blacklist to temporarily or permanently block users who frequently post unwanted content, as determined by filtering rules and a threshold.
This document summarizes research on how characteristics of social media profiles impact perceptions of source credibility. Specifically, it examines how the number of followers and the ratio of followers to follows on Twitter profiles affect judgments of trustworthiness, competence, and goodwill. The research aims to identify factors that influence how people evaluate the credibility of information from social media sources.
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
This paper proposes a system to detect spammer tweets on Twitter. It extracts features from tweets such as the number of unique mentions, unsolicited mentions, and duplicate domain names. It also uses the Google Safe Browsing API to analyze URLs. The system fetches tweets for a particular hashtag using the Twitter4J API. It then performs preprocessing like removing hashtags and quotes. The features are then used to classify tweets as spam or not spam using machine learning algorithms. The system aims to investigate linguistic features for detecting spam accounts and tweets in a supervised manner by leveraging existing hashtags for training data.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Integrated approach to detect spam in social media networks using hybrid feat...IJECEIAES
Online social networking sites are becoming more popular amongst Internet users. The Internet users spend some amount of time on popular social networking sites like Facebook, Twitter and LinkedIn etc. Online social networks are considered to be much useful tool to the society used by Internet lovers to communicate and transmit information. These social networking platforms are useful to share information, opinions and ideas, make new friends, and create new friend groups. Social networking sites provide large amount of technical information to the users. This large amount of information in social networking sites attracts cyber criminals to misuse these sites information. These users create their own accounts and spread vulnerable information to the genuine users. This information may be advertising some product, send some malicious links etc to disturb the natural users on social sites. Spammer detection is a major problem now days in social networking sites. Previous spam detection techniques use different set of features to classify spam and non spam users. In this paper we proposed a hybrid approach which uses content based and user based features for identification of spam on Twitter network. In this hybrid approach we used decision tree induction algorithm and Bayesian network algorithm to construct a classification model. We have analysed the proposed technique on twitter dataset. Our analysis shows that our proposed methodology is better than some other existing techniques.
The advancement of Information Technology has hastened the ability to disseminate information across the globe. In particular, the recent trends in ‘Social Networking’ have led to a spark in personally sensitive information being published on the World Wide Web. While such socially active websites are creative tools for expressing one’s personality it also entails serious privacy concerns. Thus, Social Networking websites could be termed a double edged sword. It is important for the law to keep abreast of these developments in technology. The purpose of this paper is to demonstrate the limits of extending existing laws to battle privacy intrusions in the Internet especially in the context of social networking. It is suggested that privacy specific legislation is the most appropriate means of protecting online privacy. In doing so it is important to maintain a balance between the competing right of expression, the failure of which may hinder the reaping of benefits offered by Internet technology
Presentation-Detecting Spammers on Social NetworksAshish Arora
This document summarizes research on detecting spammers on social networks. The researchers created fake "honeypot" profiles on Facebook, Twitter, and MySpace to collect data from spammers over 12 months. They analyzed the data to identify patterns in spam bots and campaigns. Machine learning techniques were used to develop features that detected spammers with 2-3% error rates. The techniques help social networks improve security and identify malicious users spreading spam.
This document discusses using trust and provenance for content filtering on the semantic web. It describes how trust can be inferred between individuals in social networks and how that trust information can be used to filter and recommend content. Applications mentioned include using trust ratings to help intelligence analysts sort through information by identifying reliable sources.
Detecting fake news_with_weak_social_supervisionSuresh S
This document discusses using weak social supervision for detecting fake news on social media. It defines weak supervision and describes how social media data provides a new type of weak supervision called weak social supervision. Weak social supervision can be generated from three aspects of social media - users, posts, and networks. Recent works have shown that user-based, post-based, and network-based weak social supervision can be effective for fake news detection, even with limited labeled data. Weak social supervision is a promising approach for learning tasks where labeled data is scarce.
This document discusses data mining in social networks. It covers topics like social network analysis, graph mining, and text mining on social media platforms. Graph mining is used to understand relationships and extract communities from social networks. Text mining techniques like clustering and anomaly detection are applied to textual data from blogs, messages, etc. on social platforms. The document also discusses accessing Facebook data through its API and SDK, and applications and limitations of social network analysis.
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
1) The document presents a machine learning approach to detect phishing URLs using logistic regression. It trains a logistic regression model on a dataset of 420,467 URLs that have been classified as either phishing or legitimate.
2) It preprocesses the URLs using tokenization before training the logistic regression model. The trained model is able to classify new URLs with 96% accuracy as either phishing or legitimate based on the URL features.
3) The proposed approach provides an automated way to detect phishing URLs in real-time and help prevent phishing attacks. Future work could involve developing a browser extension using this approach and increasing the dataset size for higher accuracy.
This document provides an overview of social network analysis from online behavior and data mining techniques. It discusses classical social network analysis approaches and frameworks. Various types of social network analysis are described, including whole network analysis and personal network analysis. Models and measures used in social network analysis are also outlined.
This document discusses data mining techniques for social media. It begins by reviewing the growth of popular social media sites like Facebook, YouTube, and Twitter. It then discusses how social media generates huge amounts of user data through interactions and content sharing. The document outlines opportunities to use data mining on social networks to gain insights into human behavior, marketing analytics, and more. It reviews common problems studied, like community detection, node classification, and modeling information flow. The conclusion emphasizes that social media provides a massive, open dataset for developing recommender systems and targeting marketing through predictive analysis of user interests and trends.
Identification of inference attacks on private Information from Social Networkseditorjournal
Online social networks, like
Facebook, twitter are increasingly utilized by
many people. These networks permit users to
publish details about them and to connect to
their friends. Some of the details revealed
inside these networks are meant to be
keeping private. Yet it is possible to use
learning algorithms and methods on released
data have to predict private information,
which cause inference attacks. This paper
discovers how to launch inference attacks
using released social networking details to
predict private information’s. It then
separate three possible sanitization
algorithms that could be used in various
situations. Then, it investigates the
effectiveness of these techniques and tries to
use methods of collective inference
techniques to determine sensitive attributes
of the user data set. It shows that it can
decline the effectiveness of both the local and
relational classification algorithms by using
the sanitization methods we described.
Dear Students
Ingenious techno Solution offers an expertise guidance on you Final Year IEEE & Non- IEEE Projects on the following domain
JAVA
.NET
EMBEDDED SYSTEMS
ROBOTICS
MECHANICAL
MATLAB etc
For further details contact us:
enquiry@ingenioustech.in
044-42046028 or 8428302179.
Ingenious Techno Solution
#241/85, 4th floor
Rangarajapuram main road,
Kodambakkam (Power House)
http://www.ingenioustech.in/
YOURPRIVACYPROTECTOR: A RECOMMENDER SYSTEM FOR PRIVACY SETTINGS IN SOCIAL NET...ijsptm
Ensuring privacy of users of social networks is probably an unsolvable conundrum. At the same time, an informed use of the existing privacy options by the social network participants may alleviate - or even prevent - some of the more drastic privacy-averse incidents. Unfortunately, recent surveys show that an average user is either not aware of these options or does not use them, probably due to their perceived complexity. It is therefore reasonable to believe that tools assisting users with two tasks: 1) understanding their social net behaviour in terms of their privacy settings and broad privacy categories, and 2)recommending reasonable privacy options, will be a valuable tool for everyday privacy practice in a social
network context. This paper presents YourPrivacyProtector, a recommender system that shows how simple machine learning techniques may provide useful assistance in these two tasks to Facebook users. We support our claim with empirical results of application of YourPrivacyProtector to two groups of Facebook
users.
This document discusses data mining techniques for social media. It explains that graph mining is used to cluster similar data together based on relationships and connections between nodes in a graph. Graph mining on Facebook can be used to search for friends, places, and interests while ranking results by strength of connections. Text mining extracts meaningful information from unstructured text data on social networks and can automatically process emails, classify texts, and potentially extract full information from websites.
In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.
Filter unwanted messages from walls and blocking non legitimate users in osnIAEME Publication
1. The document presents a system to filter unwanted messages from user walls in online social networks. It aims to give users more control over the content that appears on their walls.
2. A machine learning classifier is used to automatically label messages by category. Users can then specify filtering rules to block certain categories or keywords from appearing.
3. The system also implements a blacklist to temporarily or permanently block users who frequently post unwanted content, as determined by filtering rules and a threshold.
This document summarizes research on how characteristics of social media profiles impact perceptions of source credibility. Specifically, it examines how the number of followers and the ratio of followers to follows on Twitter profiles affect judgments of trustworthiness, competence, and goodwill. The research aims to identify factors that influence how people evaluate the credibility of information from social media sources.
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
This paper proposes a system to detect spammer tweets on Twitter. It extracts features from tweets such as the number of unique mentions, unsolicited mentions, and duplicate domain names. It also uses the Google Safe Browsing API to analyze URLs. The system fetches tweets for a particular hashtag using the Twitter4J API. It then performs preprocessing like removing hashtags and quotes. The features are then used to classify tweets as spam or not spam using machine learning algorithms. The system aims to investigate linguistic features for detecting spam accounts and tweets in a supervised manner by leveraging existing hashtags for training data.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
BINARY TEXT CLASSIFICATION OF CYBER HARASSMENT USING DEEP LEARNINGIRJET Journal
This document discusses the development of a cyberharassment detection system to identify abusive content on social media platforms. It reviews related works that have used machine learning techniques like convolutional neural networks and transfer learning models to detect cyberbullying. The authors investigate four neural network optimizers - Rmsprop, Adam, Adadelta, and Adagrad - and find that Rmsprop achieved the highest accuracy of 98.45% at classifying harassing content. The goal of this research is to create an effective model for automatically detecting cyberharassment online.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Social media websites are becoming more prevalent on the Internet. Sites, such as Twitter, Facebook, and Instagram, spend significantly more of their time on users online. People in social media share thoughts, views, and facts and create new acquaintances. Social media sites supply users with a great deal of useful information. This enormous quantity of social media information invites hackers to abuse data. These hackers establish fraudulent profiles for actual people and distribute useless material. The material on spam might include commercials and harmful URLs that disrupt natural users. This spam content is a massive problem in social networks. Spam identification is a vital procedure on social media networking platforms. In this paper, we have proposed a spam detection artificial intelligence technique for Twitter social networks. In this approach, we employed a vector support machine, a neural artificial network, and a random forest technique to build a model. The results indicate that, compared with RF and ANN algorithms, the suggested support vector machine algorithm has the greatest precision, recall, and Fmeasure. The findings of this paper would be useful in monitoring and tracking social media shared photos for the identification of inappropriate content and forged images and to safeguard social media from digital threats and attacks.
PHISHING DETECTION IN IMS USING DOMAIN ONTOLOGY AND CBA – AN INNOVATIVE RULE ...ijistjournal
User ignorance towards the use of communication services like Instant Messengers, emails, websites, social networks etc. is becoming the biggest advantage for phishers. It is required to create technical awareness in users by educating them to create a phishing detection application which would generate phishing alerts for the user so that phishing messages are not ignored. The lack of basic security features to detect and prevent phishing has had a profound effect on the IM clients, as they lose their faith in e-banking and e-commerce transactions, which will have a disastrous impact on the corporate and banking sectors and businesses which rely heavily on the internet. Very little research contributions were available in for phishing detection in Instant messengers. A context based, dynamic and intelligent phishing detection methodology in IMs is proposed, to analyze and detect phishing in Instant Messages with relevance to domain ontology (OBIE) and utilizes the Classification based on Association (CBA) for generating phishing rules and alerting the victims. A PDS Monitoring system algorithm is used to identify the phishing activity during exchange of messages in IMs, with high ratio of precision and recall. The results have shown improvement by the increased percentage of precision and recall when compared to the existing methods.
Terrorism Analysis through Social Media using Data MiningIRJET Journal
This document presents a study that uses deep learning models like Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) to analyze terrorism through detecting toxicity in social media text data. The study aims to classify text data into categories like toxicity, severe toxicity, obscenity, threat, insult or identity hate. It provides an overview of DNN and CNN models for text classification and compares their methodology, architecture and performance. The models are trained on preprocessed social media data related to terrorist activities and aim to accurately predict the toxicity level and classify tweets for concerned authorities to make informed decisions.
A Survey of Methods for Spotting Spammers on Twitterijtsrd
Social networking sites explosive expansion as a means of information sharing, management, communication, storage, and management has attracted hackers who abuse the Web to take advantage of security flaws for their own nefarious ends. Every day, forged internet accounts are compromised. Online social networks OSNs are rife with impersonators, phishers, scammers, and spammers who are difficult to spot. Users who send unsolicited communications to a large audience with the objective of advertising a product, entice victims to click on harmful links, or infect users systems only for financial gain are known as spammers. Many studies have been conducted to identify spam profiles in OSNs. In this essay, we have discussed the methods currently in use to identify spam Twitter users. User based, content based, or a combination of both features could be used to identify spammers. The current paper gives a summary of the traits, methodologies, detection rates, and restrictions if any for identifying spam profiles, primarily on Twitter. Hareesha Devi | Pankaj Verma | Ankit Dhiman "A Survey of Methods for Spotting Spammers on Twitter" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd57439.pdf Paper URL: https://www.ijtsrd.com.com/computer-science/artificial-intelligence/57439/a-survey-of-methods-for-spotting-spammers-on-twitter/hareesha-devi
Phishing detection in ims using domain ontology and cba an innovative rule ...ijistjournal
User ignorance towards the use of communication services like Instant Messengers, emails, websites, social networks etc. is becoming the biggest advantage for phishers. It is required to create technical awareness in users by educating them to create a phishing detection application which would generate phishing alerts for the user so that phishing messages are not ignored. The lack of basic security features to detect and prevent phishing has had a profound effect on the IM clients, as they lose their faith in e-banking and e-commerce transactions, which will have a disastrous impact on the corporate and banking sectors and businesses which rely heavily on the internet.Very little research contributions were available in for phishing detection in Instant messengers. A context
based, dynamic and intelligent phishing detection
methodology in IMs is proposed, to analyze and detect phishing in Instant Messages with relevance to domain ontology (OBIE) and utilizes the Classification based on Association (CBA) for generating phishing rules and alerting the victims. A PDS Monitoring system algorithm is used to identify the phishing activity during exchange of messages in IMs, with high ratio of precision and recall. The results have shown improvement by the increased percentage of precision and recall when compared to the existing methods.
Spammer Detection and Fake User Identification on Social NetworksIRJET Journal
This document discusses methods for detecting spammers and fake users on social networks like Twitter. It provides a literature review of past research on spam detection techniques on Twitter. The techniques are categorized based on their ability to detect: (1) fake content, (2) spam based on URLs, (3) spam in popular topics, and (4) fake users. The techniques are also analyzed based on attributes like user attributes, content attributes, graph attributes, structural attributes, and time attributes. An implementation approach is proposed that involves collecting user data, employing machine learning algorithms like random forest for pattern recognition, using feature engineering, and integrating social media APIs for real-time monitoring. It also discusses integrating a user reporting mechanism to improve accuracy
BullyNet is a three-phase algorithm that detects cyberbullies on Twitter. It first constructs a cyberbullying signed network (SN) by analyzing tweets and their contexts to determine sentiment and assign bullying scores to connections between users. It then proposes a centrality measure called attitude and merit (A&M) to identify cyberbullies from the SN. The algorithm was tested on a dataset of 5.6 million tweets and effectively detected cyberbullies with high accuracy while scaling to large numbers of tweets.
Towards a hybrid recommendation approach using a community detection and eval...IJECEIAES
In social learning platforms, community detection algorithms are used to identify groups of learners with similar interests, behavior, and levels. While, recommendation algorithms personalize the learning experience based on learners' profile information, including interests and past behavior. Combining these algorithms can improve the recommendation quality by identifying learners with similar needs and interests for more accurate and relevant suggestions. Community detection enhances recommendations by identifying groups of learners with similar needs and interests. Leveraging their similarities, recommendation algorithms generate more accurate suggestions. In this article, we propose a novel approach that combines community detection and recommendation algorithms into a single framework to provide learners with personalized recommendations and opportunities for collaborative learning. Our proposed approach consists of three steps: first, applying the maximal clique-based algorithm to detect learning communities with common characteristics and interests; second, evaluating learners within their communities using static and dynamic evaluation; and third, generating personalized recommendations within each detected cluster using a recommendation system based on correlation and co-occurrence. To evaluate the effectiveness of our proposed approach, we conducted experiments on a real-world dataset. Our results show that our approach outperforms existing methods in terms of modularity, precision, and accuracy.
This document summarizes 12 sources on topics related to cybersecurity threats such as spam, phishing, ransomware, and malware detection techniques. The sources discuss using machine learning algorithms and heuristic approaches to detect spam accounts, phishing websites, and ransomware variants. Some propose new techniques like using Twitter data and mobile messages to detect spam or combining feature selection with metaheuristic algorithms to identify phishing sites. Others analyze hyperlinks or topic distributions to identify phishing attacks or classify anti-phishing solutions. The document also mentions the rise of ransomware attacks and techniques used in related studies.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.vivatechijri
In this technical age there are many ways where an attacker can get access to people’s sensitive information illegitimately. One of the ways is Phishing, Phishing is an activity of misleading people into giving their sensitive information on fraud websites that lookalike to the real website. The phishers aim is to steal personal information, bank details etc. Day by day it’s getting more and more risky to enter your personal information on websites fearing that it might be a phishing attack and can steal your sensitive information. That’s why phishing website detection is necessary to alert the user and block the website. An automated detection of phishing attack is necessary one of which is machine learning. Machine Learning is one of the efficient techniques to detect phishing attack as it removes drawback of existing approaches. Efficient machine learning model with content based approach proves very effective to detect phishing websites.
Our proposed system uses Hybrid approach which combines machine learning based method and content based method. The URL based features will be extracted and passed to machine learning model and in content based approach, TF-IDF algorithm will detect a phishing website by using the top keywords of a web page. This hybrid approach is used to achieve highly efficient result. Finally, our system will notify and alert user if the website is Phishing or Legitimate.
An IAC Approach for Detecting Profile Cloning in Online Social NetworksIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users register on and share their own personal information with others. Privacy threats and disclosing personal information are the most important concerns of OSNs’ users. Recently, a new attack which is named Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real user in order to access to private information of the users’ friends which they do not publish on the public profiles. In today OSNs, there are some verification services, but they are not active services and they are useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are explained in more details and a new and precise method to detect profile cloning in online social networks is proposed. In this method, first, the social network is shown in a form of graph, then, according to similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar profiles to the real profile are gathered (from the same community), then strength of relationship (among all selected profiles and the real profile) is calculated, and those which have the less strength of relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with two previous works by applying them on the dataset.
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYIJNSA Journal
Email is a channel of communication which is considered to be a confidential medium of communication for exchange of information among individuals and organisations. The confidentiality consideration about e-mail is no longer the case as attackers send malicious emails to users to deceive them into disclosing their private personal information such as username, password, and bank card details, etc. In search of a solution to combat phishing cybercrime attacks, different approaches have been developed. However, the traditional exiting solutions have been limited in assisting email users to identify phishing emails from legitimate ones. This paper reveals the different email and website phishing solutions in phishing attack detection. It first provides a literature analysis of different existing phishing mitigation approaches. It then provides a discussion on the limitations of the techniques, before concluding with an explorationin to how phishing detection can be improved.
An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal
This document summarizes a research paper that proposes a model to improve detection of malicious spam emails through feature selection. The model employs a novel dataset for feature selection to optimize classification parameters, prediction accuracy, and computation time. Feature selection is expected to improve training time and classification accuracy. The paper also compares various classifiers, including Naive Bayes and Support Vector Machine, on the selected feature subset. The goal is to automatically learn to detect malicious spam emails, which threaten privacy and security by spreading malware, phishing links, and sensitive data theft.
Similar to Spammer taxonomy using scientific approach (20)
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
1. 2017 IEEE/WIC/ACM International Joint Conference on Web Intelligence(WI) and IntelligentAgent
Technologies(IAT)
Spammer Taxonomy Using Strategic Approach over bench Mark structural Social
Networks Attributes
Balogun Abiodun Kamoru1,
Azmi B. Jaafar2,
Masrah Azrifah Azmi Murad3
, Marzanah Binti Jabar4
Department of information Systems and Software Engineering, Faculty of Computer Science and Information
technology,Universiti Putra Malaysia, Serdang 43300,Selangor D.E.Malaysia.
e-mail: balogun@consultant.com, Azmij@upm.edu.my, Masrah@upm.edu.my, Marzanah@upm.edu.my
Abstract
Social network has become so popular with overwhelming high rate of growth, due to this popularity theonline social
networks is facing the issues of spamming, which has leads to unsubstantialeconomic loss to this menace of spamand
spammers activities. It has leads to uncontrollable dissemination of viruses and malwares, promotionalads, phishing, and
scams. spam activities has enter a new dangerous dimension, the spammers have step up their games and tactics online
social networks, it consumes large amounts of network bandwidth leading to less revenue and significant economic loss to
both private and public sectors. From the previous scholars work on spammer classification taxonomy, various machine
learning techniques have been extensively used to detect spamactivities and spammers in online social networks. There are
various classifier that are learn over content-based features extracted from theuser's interactions and profiles to label them
as spam/spammers or legitimate. But recently, new network structural bench mark features have been proposed for
spammer detection task, but their importance using structuralbench mark learning methods has not been extensively
evaluated yet. In this research work, we evaluate thethe metric performance of some structuralbench mark learning methods
using scientific and strategic approach based attributes extracted from an interaction network for the task of spammer
detection in online social network.
General Word: Spam Detection, Bench Mark. Information Security
Keywords: Social Network Security; Spam Detection; Spam taxonomy; Machine Learning; Strategic/Scientifc
Approaches/Techniques.
I. INTRODUCTION
spam and spamming activities are undoubtedly a very serious problem in online social
networks, spam has really enter a dangerous dimension which prompt response from the
various research scholars to embark on the spammer classification taxonomy. one of the big
challenges and issues faced by online social networks(OSN) is to deal with undesirable users
and their malicious users (spammers) to project irrelevant information to as large number of
legitimate users as possible. The motive behind spamming commonly includes promoting
products, Viral marketing, spreading fads, espionage activities, cyber war, and in some cases
possibly to harass legitimate users to decrease their trust in a particular services [1]. Thus,
spam email is ubiquitous and so is its usage abusive.
2. According to Symantec Intelligence Report,(SIR,2017) [2] indicate that the global ratio of
spam in email traffic is 71.9% , furthermore, adversaries have taken advantage of the ability
to send countless emails anonymously, at the speed of light, to carry on vicious activities ( e.g
advertising of fake gods or medications, scams causing financial losses or even severely, to
commit cyber crimes e.g child pornography, identity theft, phishing and malware
distribution, Son Dinh et al,2015 [3].
It becomes highly important to devise a scientific and strategic approaches and techniques to
identify spammers and their behaviour in social network attributes, along this critical
direction many spammer and spam detection methods have been proposed by previous and
existing scholars, which are mostly based on based content analysis of users’ interaction data
value, many counter filtering techniques based on the usage of non-dictionary words and
images in spam objects are often employed by spammers. Content-based spam filtering
systems also demand high value computations. Some spammer detection methods and
techniques are based on learning classification models from network-based topological
features of the interacting nodes in social networks, the attributes includes degree, out-degree,
reciprocity, clustering coefficient.etc.
The impact of social spam is already significant. A social spam message is potentially seen
by all the followers and recipients’ friends. Even worse, it might cause misdirection and
misunderstanding in public domain especially social networks like Facebook, Twitter, Sina
weibo, Tagged, Badoo, Orkut, LinkedIn, my space etc [4].According to Khurana and Kumar
[4], it gives the types of spammers as the following: (a) Fake Users (b) Phishers (c)
Promoters (d) Spammers.
Fake Users: are the users who impersonate the profile of a genuine users to send
spam content to the friends’ of that user or other users in the network.
Phishers: are the users who behave like a normal user to acquire personal data of other
genuine users.
Promoters: are the one who sends malicious links of advertisements or other
promotional links to others so as to obtain their personal information.
Spammers: are the malicious users who contaminate the information presented by the
legitimate users and in turn pose a risk to the security and privacy of social networks.
Spammers belong to one of the following categories.
While the motives of the spammers are as follows:
Phishing attacks
Disseminate pornography
Spread viruses
Compromise system reputation.
Recently,in [8], we have proposed some community-based topological features to learn
improved classification models for identifying spammers in online social networks. However,
the results only spanned over single classifiers. In this paper, we aim to evaluate the
3. performance metrics of the proposed attributes in [8] for learning scientific approach models
on bench mark structural classification, the three bench mark structural learning methods use
are the following- Bagging, Boosting, and Stacking over topological features extracted from
a real-world interaction network with artificially planted spammers on social networks, the
results are generated for both single and bench mark structural classifiers to evaluate their
performance metrics.
Within the past few years, online social networks such as Facebook, Twitter, Weibo, etc, has
become one of the major way for the internet users to keep communications with their
friends. According to Statista report [5] , the number of social network users has reached 1.61
Billion until late 2013, and is estimated to be around 2.33 Billion users globe, until the end of
2017.
With great technical and commercial success, social network platform also provides a larger
amount of opportunities for broadcasting spammers, which spreads malicious messages and
behaviour. According to Nexgate’s report [6] [7] during the first half of 2016, the growth of
social spam has been 355%, much faster than the growth rate of accounts and messages on
the most branded social networks.
II. RELATED WORK
Spam detection methods and techniques usually involve two approaches-content-based
learning and topology based learning. The main idea behind content-based learning revolves
around the observation that spammers use distinguished keywords, URLs, etc. in their
interactions and to define their profiles . such content based attributes are used to learn
classification models to label messages and profiles as legitimate or spam [9]. However, such
approach is often deceived by spammers using copy profiling and content obfuscation. On
the other hand, topology-based learning methods aim to exploit structural social network
attributes like clustering coefficient, community structures, reciprocity, degree,etc. to
characterize network behaviour of legitimate and spammer accounts.
Shrivastava et al. [10] incorporated attributes including clustering coefficient and
neighbourhood independence to deal with Random Link Attacks from spammers. Bhat and
Abulaish [11] extracted features in-links, out-links, cross-links, etc, from a web graph to
classify pages as spam or benign. Other methods include finding physical node clusters based
on network-level attributes from online communication networks[12]. To detect spam
clusters, Gao et al [13] used two widely acknowledged distinguish features of spam
campaign-distributed coverage and bursty nature, the distributed property is quantified using
the number of users that send wall posts in the cluster, whereas the busty property is based on
the intuition that most spam campaigns involve coordinated action by many accounts within
short periods of time [14]. The methods proposed in[8] and [15] used a community
detection method to split the interaction network into communities and then extract
community –based attributes of network nodes (users) to classify them as a spammers or
legitimate.
4. One of the limitations of the approaches and techniques mentioned earlier is the
classification models used by them are mostly single. In the literature, there exist bench mark
structural classification methods that can be used to improve the performance metrics of
classifiers by learning multiple models over the same training example set and then using
some aggregation method to decide upon a single combined label determined by multiples
classifiers. Although , some Bench Mark Structural Methods have been used in content-
based classification of spammers, they have not yet been evaluated for the topological
attributes. however, the use of Bench Mark Structural learning methods for improving the
performance metrics of spam detection methods has been adopted by many researchers, but
the studies have been mainly oriented towards content-based classification. In [16], the
authors used an ensemble approach or Bench Mark Structure method with sampling
classification strategy incorporating C4.5, bagging, and ada boost. Their results using
Ensemble approach or bench mark structure method showed improvement in web spam
detection performance metrics effectively. Using a text corpus, the authors in [17] aimed to
show the significance of bench mark structure method classifiers for spam detection.
However, they failed to show any significant improvement in the task. In [18] , the authors
highlighted the high performance of Bench Mark method Classifiers involving Adaboost,
Stacking and Decision Tree, against the best and excellent performances of single classifiers
for e-mail spam detection using a content-based approach. In [15], the authors showed that
the classifier proposed by Caruana et al. performed better than the most individual and
classifiers implemented in WEKA for the[18] task of email spam detection. In [11] , the
authors exploited both content-based and link-based features to compile a minimal attributes
set that can be computed incrementally in a quick manner to allow intercepting spam. They
also showed that for a selected feature set, Bench mark Classification techniques previously
published method and the web spam challenge 2008-2016 best results.
III. ENSEMBLE METHODS OR BENCH MARK STRUCTURAL METHODS
The Ensemble method or Bench Mark Structural Method classifiers group multiple machine
learning instances to improve the classification results of a system. It is based on the
assumption that combination of multiple classifiers may be able to produce an overall
classifier which is more stable and accurate than any of its components. According to
Dietterich [19], the performance metric advantage of ensemble classifiers can be attributed
to three key factors: (i) By Combining multiple hypotheses to form an ensemble; their votes
are averaged and the risk of selecting an incorrect hypotheses is reduced, (ii) by starting a
local search in different locations; ensemble can provide a better approximation of the true
underlying, (iii) a weighted sum of the hypothesis within the ensemble, which may extend the
space of representable hypotheses to allow a more accurate representation.
A brief description of the three mostly used ensemble methods or bench mark structural
methods is given in the following paragraphs.
5. I. BAGGING
Bootstrap aggregation (or bagging), proposed by Quilan [20] , involves training multiple
instances of classifiers on a sample of training examples which are taken at random with
replacement (bootstrap sample). Finally, the labels of the test samples are determined by a
majority vote of each internally learned classifier.
II. BOOSTING
Boosting is referred to Arcing (Adaptive Resampling and Combining) [20], boosting first
involves assigning weights to the training set instances, then on each learning iteration it
increases and decreases the weights for misclassified and correctly classified instances,
respectively. The difficulty of learning problem is effectively increased on each iteration,
with an attempt to minimize the weighted error on training set. It involves repeatedly learning
a weak classifier on various distributed samples of the training data. The classifiers learnt at
each step are then combined into a single strong classifier to achieve a higher accuracy than
the individual components. The final classification decision is a combination of the decisions
made in all rounds, namely a weighted majority vote, where decisions with lower
classification error have higher weight.
III. STACKING
Stacking is an ensemble learning approach or bench mark structural methods which aims to
determine the reliability of its constituent individual classifiers (often different models) to
achieve the highest generalization accuracy [11]. It Involves using a meta-learner, which uses
the predicted classification of the constituent classifiers as input attributes. The meta-
classifier is used to reflect the true performance metric measure of the individual constituent
classifiers.
IV. EXPERIMENTAL AND OBSERVATION RESULT
The main and major aim of this paper is to come up with the evaluation on the performance
metrics of the ensemble classifier or bench mark classifier for spam detection and spammer in
online social networks. In order to do so, community-based structural attributes proposed in
[10] are extracted from a real-world social network dataset with artificially planted
spammers. We compare the performance metric of multiple classifiers including decision
tress, Naivebayes, and k-NN and their ensemble variants implemented in WEKA [21]
software.
Y P(Out-degree=y)
1 0.664
2 0.171
3 0.07
4 0.04
6. 5 0.024
6 0.014
7 0.01
8 0.07
TABLE 1: SPAMMER OUT-DEGREE DISTRIBUTION
A. DATASET
For experimental work, we use a real- world social network dataset representing the wall
post activity of about 63891 Facebook users [22]. The nodes in this network are considered to
be legitimate nodes. We inject additional nodes in the network to simulate spammer behavior.
In this view, we subsequently filter out all the nodes having zero in-degree or out-degree, and
any isolated nodes from the network to represent them as legitimate networks. This results in
a network which 32693 legitimate nodes. Thereafter, in order to simulate spammers, we
generate a set of 1000 isolated nodes for the legitimate network, which create out-links to
randomly selected nodes in the legitimate network. The out-links or the out-degree generated
for the spammers are not random but follow. The messages of the spammers are expected to
be leaqst often reciprocated. Thus, the probability of a legitimate node replying to a spammer
is set to 0.05 [23].
In order to make the detection task more difficult, we generate another set of 1000 spammer
nodes, which try to mimic the clustering property of legitimate nodes. In order to do so, we
have used the LFR- benchmark generator [24] to generate a directed network of 1000 nodes
with embedded community structures. The LFR-benchmark parameters used to generate the
network are shown in Table II.
Parameter Description Value
N Number of nodes 1000
K Average degree 15
Kmax Max degree 60
Cmin Minimum community size 15
Cmax Maximum community size 60
𝜕 Degree exponent -1
₾ Community exponent -1
µ Mixing parameter 0.1
TABLE II : LFR-BENCHMARK PARAMETER DESCRIPTION AND VALUES
Now, for each node in the synthetic network, we rewire a set of its out-links towards a set of
randomly selected nodes in the legitimate network such that spamming out-degree i.e the
rewired out-links follows the distribution given Table I, In this regard, a total of 2000
7. spammer nodes (out of which 1000 mimic the clustering property of legitimate nodes) are
added to the legitimate network resulting in a total of 34693 nodes. Thereafter, we extract
community -based features proposed in [12]from the network.
B. RESULTS
In order to evaluate the significance of the bench mark structural method using community-
based structural features, we learn a set of classifiers from WEKA classifier on the training
examples containing the community-based features from the datasets mentioned in the
previous section. We evaluate the performance metrics of the three classifiers including J48
(decision-tree) [25], IBk (k-NN using k=5 nearest neighbors) [26], and NaiveBayes [27] by
considering them individually and also by using bagging, boosting and stacking over them.
We use 10-fold cross validation for each classifier on the dataset to evaluate performance
metric.
Table III presents the performance metrics (averaged for the two classes) of the various
individual classifiers on the dataset with planted spammers, wherein it is clear that the
decision tree based classifier J48 performs better than the two classifiers.
Classifier
(Individual)
TP
Rate
FP
Rate
Precision Recall F- Measure
J48 0.963 0.075 0.963 0.963 0.963
IBk (k=4) 0.938 0.159 0.159 0.937 0.937
NaiveBayes 0.914 0.175 0.917 0.914 0.915
TABLE III. PERFORMANCE METRICS OF INDIVIDUAL CLASSIFIERS
Tables IV and V present the performance metric of the bagging and boosting bench mark
structural method over the three base classifiers, respectively for the spammer detection task.
It can be clearly seen from the Tables that the performances metrics of J48 and IBk classifiers
using bagging and boosting bench mark structural method approach is better than their
individual performances metrics. However, in case of Naive Bayes classifier, the bench mark
structural method approaches show low performance metrics than their individual
performance for spammer detection task using structural attributes.
8. Classifier
(Bagging)
TP
Rate
FP
Rate
Precision Recall F- Measure
J48 0.969 0.067 0.969 0.969 0.969
IBk (k=4) 0.941 0.133 0.942 0.941 0.942
Naivebayes 0.706 0.155 0.857 0.706 0.741
TABLE IV: PERFORMANCE METRIC OF CLASSIFIER USING BAGGING BENCH
MARK STRUCTURAL METHOD
Classifier
(Boosting)
TP
Rate
FP
Rate
Precision Recall F- Measure
J48 0.966 0.082 0.966 0.966 0.966
IBk (k=4) 0.938 0.159 0.937 0.938 0.937
NaiveBayes 0.705 0.155 0.857 0.705 0.74
TABLE V: PERFORMANC METRIC OF CLASSIFIER USING BOOSTING BENCH
MARK STRUCTURAL METHOD
Classifier
(Stacking)
TP
Rate
FP
Rate
Precision Recall F- Measure
J48 (Meta) 0.961 0.085 0.962 0.961 0.961
TABLE VI: PERFORMANCE METRIC OF THE STACKING BENCH MARK
STRUCTURE METHOD INVOLVING ALL THREE CLASSIFIER AND J48 AS THE
META CLASSIFIER
TableVI presents the performance metrics of the stacking bench mark structural method
learning approach which incorporates all three classifiers (J48, IBk, Naive Bayes) as its base
classifiers and J48 as its meta classifier. It can be observed that its performance is lower than
the best case of the other two bench mark structural approaches and closer to the individua;
performance metrics of the J48 classifier.
From these results, it can be observed that the bagging bench mark structural learning
approach using J48 classifier performs important better than the individual performance of
9. IBk and naive bayes classifiers and also better than the other two bench mark approaches, i.e,
boosting and stacking for the task of spammer detection in online social networks.
V. CONCLUSION
Spam detection in online social networks is challenging, but a highly desirable accomplish
task. Numerous machine learning approaches using content-based features have been used in
previous research to detect email spam. Bench Mark learning approaches like bagging and
boosting that aim to improve the efficiency of the individual classifiers exist in literature,
but they have not been extensively evaluated for the spammer detection task. Moreover, new
structural features based on community structures of online social network users have also
been proposed recently for spammer detection. This paper evaluates the performance metrics
of some bench mark learning approaches for the task of spammer detection in online social
networks. The observation of the experimental results reveal that the bagging bench mark
learning approach using J48 (decision tree) base classifier performs better than its individual
model and also better than some other ensemble or bench mark learning approaches for
spammer detection using structural social network attributes.
REFERENCES
[1] Saini Jacob.S, S.Murugappan. " A study of Spam Detection Algorithm on Social Media
Networks" Journal of Computer Science 10 (10) : 2135-2140, 2014. JCS.
[2] Symantec Intellignce Report
SIR2017)https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-2017-en.pdf
[3] Son Dinh, Taher Azeb, Francis Fortin, Djedjiga Mouheb and Mourad Debbabi; "Spam
Campaign Detection, Analysis, and Investigation" From the proceedings ofThe Digital
Forensic Research ConferenceDFRWS 2015 EUDublin, Ireland (Mar 23rd- 26th) 2015.
[4] Girisha Khurana, Marish Kumar; "Review : Efficient Spam Detection on Social network"
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
Vol 3 Issue VI,June 2015.
[5] https://www.statista.com/
10. [6] nextgate report (https://www.proofpoint.com/us/solutions/social-media-protection-and-
compliance)
[7] Trust evaluation based content filtering in social interactive data, in: proceedings of the
2013 International Conference on Cloud Computing and big Data (CloudCom-Asia),
IEEE,2013,pp.538-542.
[8] De Wang, danesh Irani, Pu Calton; " A Social-Spam Detection Framework" CEAS-
2011,8th Annual Collaboration, Electronic messaging Anti-Abuse and Spam Conference
Sept.1-2, 2011. Perth Australia.
[9] Nasim Eshraqi, Mehrdad Jalali, Hossein Moattar, " Spam Detection In Social Networks:
A Review".2nd International Congress on technology Communication and Knowledge
(ICTCK,2015) November,11-12,2015. Islamic Azad University, Iran.
[10] Shrivastava, N., Majumder, A. and Rastogi, R.; Mining (Social) Network graphs to
detect random link attacks, In Proceedings of the IEEE-24th International Conference on
Data Engineering (ICDE'08), Washington Dc,USA. IEEE Computer Society. pp 486-
495,2008.
[11] Bhat. S.Y., and Abulaish, M." Community-based features for identifying spammers in
online social networks. In Proceedings of 2013 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining, pp.100-107,ACM, 2013.
[12] Freund Y. and Schapire R. E., Experiments with a new boosting algorithm. In
Proceedings of the 13th International Conference on Machine Learning, pp.325-332,1996.
[13] Gao.H, Hu.J, Wilson.C, Li.Z, Chen.Y, Zhao.B.Y. Detecting and characterizing social
spam campaigns Proceedings of the 10th ACM SIGCOMM conference on internet
measurement, ACM (2010), pp. 35–47.
[14] Lam,H. A Learning Approach to spam detection Based on Social Networks. Hong Kong
University of Science and Technology.2007.
[15] Stringhini, G., Kruegel, C, and Vigna, G. Detecting Spammers on Social networks. In
Proceedings of the 26th Annual Computer Security Applicationd Conference
(ACSAC'10),NY, USA,ACM,pp,1-9,2010.
[16] Gan,Q. and Suel, T. Improving web spam classifiers using link structure, In Proceedings
of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIR
WEB'07) NY,USA,ACM,pp. 17-20.2007.
[17] Ramachandran, A., Feamster, N. and Vempala,S. Filtering Spam with behavioral
blacklisting, In Proceedings of the 14th ACM Conference on Computer and Communications
Security (CCS'07),NY,USA,ACM,pp 342-351.2007.
11. [18] Caruana, R., et al. Ensemble Selection from Libraries of models. In Proceedings of the
21st International Conferenceon Machine Learning, pp.137-144,2004.
[19] Dietterich, T. G. Ensemble methods in Machine Learning Lecture Notes in Computer
Science, pp,1-5.2000.
[20] Quilan, J. Bagging, Boosting and C4.5. In 13th National Conference on Artificial
Intelligence. AAAI/MIT Press, 1996.
[21] Frank, E., Hall. M., Holmes,G., Kirkby,R., Pfahringer ,B., Witten, I, and Trigg, I. Weka,
In Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach (Eds),
Springer,pp.1305-1314,2005.
[22] Viswanath, B., et al., On The Evolution of User interactionin Facebook. In Proceeding
of the workshop on Online Social Networks, pp.37-42,2009.
[23] Bouguessa, M. An Unsupervised approach for identifying Spammers in Social
Networks. In Proceedings of the IEEE 23rd International Conference on Tools with Artificial
Intelligence (ICTAI'11) Washington,DC, USA, IEEE Computer Society, pp,832-840,2011.
[24] Lancichinetti, A. and Fortunato, S." Benchmarks for testing Community detection
algorithm on directed and weighted graphs with overlapping communities. physical review
E.Vol.80,2009.
[25] Quilan, J.R.. C4.5: Programs for machine learning, SanFrancisco CA, USA, Morgan
Kaufimann Publishers Inc. 1993.
[26] Aha. D. W., et al. Instance-based learning algorithms.Machine Learning, Vol.6,no.1,pp,
37-66. 1991.
[27] John, G. H. and Langley, P. estimating Continous distributions in Bayesians Classifiers.
In Proceedings of the11th Conference on Uncertainty in Artificial Intelligence
(UAI'95), San Francisco, CA, USA, Morgan Kaufimann Publishers Inc,. pp.338-345,1995.