In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an antispam
filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement
Now a days Short Message Service(SMS) is most popular way to communication for mobile user because it is cheapest mode or version for communication than other mode.SMS is used for transmitting short length msg of around 160 character to different devices such as smart phones, cellular phones, PDAs using standardized communication protocols. The amount of Short Message Service (SMS) spam is increasing. SMS spam should be put into the spam folder, not the inbox. The growth of the mobile phone users has led to a dramatic increase in SMS spam messages. To avoid this problem SMS filtering Techniques are used. Our proposed approach filters SMS spam on an independent mobile phone on a large dataset and acceptable processing time. There are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. Riya Mehta | Ankita Gandhi"A Survey: SMS Spam Filtering" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12850.pdf http://www.ijtsrd.com/computer-science/data-miining/12850/a-survey-sms-spam-filtering/riya-mehta
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an antispam
filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement
Now a days Short Message Service(SMS) is most popular way to communication for mobile user because it is cheapest mode or version for communication than other mode.SMS is used for transmitting short length msg of around 160 character to different devices such as smart phones, cellular phones, PDAs using standardized communication protocols. The amount of Short Message Service (SMS) spam is increasing. SMS spam should be put into the spam folder, not the inbox. The growth of the mobile phone users has led to a dramatic increase in SMS spam messages. To avoid this problem SMS filtering Techniques are used. Our proposed approach filters SMS spam on an independent mobile phone on a large dataset and acceptable processing time. There are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. Riya Mehta | Ankita Gandhi"A Survey: SMS Spam Filtering" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12850.pdf http://www.ijtsrd.com/computer-science/data-miining/12850/a-survey-sms-spam-filtering/riya-mehta
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...IJNSA Journal
Unsolicited Bulk Emails (also known as Spam) are undesirable emails sent to massive number of users. Spam emails consume the network resources and cause lots of security uncertainties. As we studied, the location where the spam filter operates in is an important parameter to preserve network resources. Although there are many different methods to block spam emails, most of program developers only intend to block spam emails from being delivered to their clients. In this paper, we will introduce a new and efficient approach to prevent spam emails from being transferred. The result shows that if we focus on developing a filtering method for spams emails in the sender mail server rather than the receiver mail server, we can detect the spam emails in the shortest time consequently to avoid wasting network resources.
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of communication so are the attempts to misuse it to
take undue advantage of its low cost and high reachability. However, as email communication
is very cheap, spammers are taking advantage of it for advertising their products, for
committing cybercrimes. So, researchers are working hard to combat with the spammers. Many
spam detections techniques and systems are built to fight spammers. But the spammers are
continuously finding new ways to defeat the existing filters. This paper describes the existing
spam filters techniques and proposes a multi-level architecture for spam email detection. We
present the analysis of the architecture to prove the effectiveness of the architecture.
Email spam, also known as junk email or unsolicited
bulk email(UBE), is a subset of electronic spam involving nearly
identical messages sent to numerous recipients by email. Clicking
on links in spam email may send users to phishing web sites or sites
that are hosting malware. Spam email may also include malware as
scripts or other executable file attachments. Definitions of spam
usually include the aspects that email is unsolicited and sent in bulk
In order to overcome spam problem many researchers have
been conducted and various method of anti-spam filtering have
been implemented. A spam filter is a set of instruction for
determining the status of the received email. Spam filters are used
to prevent spam email passing through the recipient. The main
challenge is how to design an effective spam filter that allows
desired email to pass through while blocking the unwanted email.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective
and fast method to share and information exchangeall over the world. In recent years, emails users are
facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It
consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most
powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of
dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is
the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam
message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and
Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of
PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest
Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on
spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental
results show that our approach is performed in accuracy compared with other approaches that use the
same dataset.
Proposed efficient algorithm to filter spam using machine learning techniques
Ali Shafigh Aski a, *, Navid Khalilzadeh Sourati b
a Department of Computer Engineering, Islamic Azad University, Sari Branch, Islamic Republic of Iran
b Islamic Azad University of Amol, Ayatollah Amoli Branch, Islamic Republic of Iran
Evaluation of Spam Detection and Prevention Frameworks for Email and Image Sp...Pedram Hayati
In recent years, online spam has become a major problem for
the sustainability of the Internet. Excessive amounts of spam
are not only reducing the quality of information available on
the Internet but also creating concern amongst search engines
and web users. This paper aims to analyse existing works in
two different categories of spam domains - email spam and
image spam to gain a deeper understanding of this problem.
Future research directions are also presented in these spam
domains.
More info: http://debii.curtin.edu.au/~pedram/research/publications/76-evaluation-of-spam-detection-and-prevention-frameworks-for-email-and-image-spam-a-state-of-art.html
Spams are unwanted and also undesirable emails which are mass sent to the numerous victims. Further
penetration of spams into electronic processors and communication equipments such as computers and
mobiles as well as lack of control on the information shared on the internet and other communication
networks and also inefficiency of the spam detecting methods developed for Persian contexts are among the
main challenging issues of the Persian subscribers. This paper presents a novel and efficient method for
thematic identification of Persian spams. The proposed method is capable of identifying the Persian, spams
and also “Penglish” spams. “Penglish” is made up of two words Persian and English and demonstrates a
Persian text which is written by English alphabetic letters. Based on the experimental analysis of the 10000
spams of different type the efficiency of the proposed method is evaluated to be more than 98%. The
presented method is also capable of updating its databases taking the advantage of the feedbacks received
from the users.
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
Email systems have suffered from degraded quality of service due to rampant spam, phishing and fraudulent emails. This is partly because the classification speed of email filtering systems falls far behind the requirements of email service providers. We are motivated to address this issue from the perspective of computer architecture support. In this paper, as the first step towards novel architecture designs, we present extensive performance data collected from measurement and profiling experiments using representative email filtering systems including CRM114, DSPAM, SpamAssassin and TREC Bogofilter. We provide detailed analysis of the time consuming functions in the systems under study. We also show how the processor architecture parameters affect the performance of these email filters through simulation experiments.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
Spam filtering by using Genetic based Feature SelectionEditor IJCATR
Spam is defined as redundant and unwanted electronica letters, and nowadays, it has created many problems in business life such as
occupying networks bandwidth and the space of user’s mailbox. Due to these problems, much research has been carried out in this
regard by using classification technique. The resent research show that feature selection can have positive effect on the efficiency of
machine learning algorithm. Most algorithms try to present a data model depending on certain detection of small set of features.
Unrelated features in the process of making model result in weak estimation and more computations. In this research it has been tried
to evaluate spam detection in legal electronica letters, and their effect on several Machin learning algorithms through presenting a
feature selection method based on genetic algorithm. Bayesian network and KNN classifiers have been taken into account in
classification phase and spam base dataset is used.
This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text-dependent speaker identification system. It contains three main modules: endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short-Term Energy analysis is used for endpoint detection. Mel-frequency Cepstrum Coefficients (MFCC) is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization (VQ) approach using Linde, Buzo and Gray (LBG) clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung"A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18274.pdf http://www.ijtsrd.com/other-scientific-research-area/other/18274/a-robust-speaker-identification-system/zaw-win-aung
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...IJNSA Journal
Unsolicited Bulk Emails (also known as Spam) are undesirable emails sent to massive number of users. Spam emails consume the network resources and cause lots of security uncertainties. As we studied, the location where the spam filter operates in is an important parameter to preserve network resources. Although there are many different methods to block spam emails, most of program developers only intend to block spam emails from being delivered to their clients. In this paper, we will introduce a new and efficient approach to prevent spam emails from being transferred. The result shows that if we focus on developing a filtering method for spams emails in the sender mail server rather than the receiver mail server, we can detect the spam emails in the shortest time consequently to avoid wasting network resources.
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of communication so are the attempts to misuse it to
take undue advantage of its low cost and high reachability. However, as email communication
is very cheap, spammers are taking advantage of it for advertising their products, for
committing cybercrimes. So, researchers are working hard to combat with the spammers. Many
spam detections techniques and systems are built to fight spammers. But the spammers are
continuously finding new ways to defeat the existing filters. This paper describes the existing
spam filters techniques and proposes a multi-level architecture for spam email detection. We
present the analysis of the architecture to prove the effectiveness of the architecture.
Email spam, also known as junk email or unsolicited
bulk email(UBE), is a subset of electronic spam involving nearly
identical messages sent to numerous recipients by email. Clicking
on links in spam email may send users to phishing web sites or sites
that are hosting malware. Spam email may also include malware as
scripts or other executable file attachments. Definitions of spam
usually include the aspects that email is unsolicited and sent in bulk
In order to overcome spam problem many researchers have
been conducted and various method of anti-spam filtering have
been implemented. A spam filter is a set of instruction for
determining the status of the received email. Spam filters are used
to prevent spam email passing through the recipient. The main
challenge is how to design an effective spam filter that allows
desired email to pass through while blocking the unwanted email.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective
and fast method to share and information exchangeall over the world. In recent years, emails users are
facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It
consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most
powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of
dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is
the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam
message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and
Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of
PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest
Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on
spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental
results show that our approach is performed in accuracy compared with other approaches that use the
same dataset.
Proposed efficient algorithm to filter spam using machine learning techniques
Ali Shafigh Aski a, *, Navid Khalilzadeh Sourati b
a Department of Computer Engineering, Islamic Azad University, Sari Branch, Islamic Republic of Iran
b Islamic Azad University of Amol, Ayatollah Amoli Branch, Islamic Republic of Iran
Evaluation of Spam Detection and Prevention Frameworks for Email and Image Sp...Pedram Hayati
In recent years, online spam has become a major problem for
the sustainability of the Internet. Excessive amounts of spam
are not only reducing the quality of information available on
the Internet but also creating concern amongst search engines
and web users. This paper aims to analyse existing works in
two different categories of spam domains - email spam and
image spam to gain a deeper understanding of this problem.
Future research directions are also presented in these spam
domains.
More info: http://debii.curtin.edu.au/~pedram/research/publications/76-evaluation-of-spam-detection-and-prevention-frameworks-for-email-and-image-spam-a-state-of-art.html
Spams are unwanted and also undesirable emails which are mass sent to the numerous victims. Further
penetration of spams into electronic processors and communication equipments such as computers and
mobiles as well as lack of control on the information shared on the internet and other communication
networks and also inefficiency of the spam detecting methods developed for Persian contexts are among the
main challenging issues of the Persian subscribers. This paper presents a novel and efficient method for
thematic identification of Persian spams. The proposed method is capable of identifying the Persian, spams
and also “Penglish” spams. “Penglish” is made up of two words Persian and English and demonstrates a
Persian text which is written by English alphabetic letters. Based on the experimental analysis of the 10000
spams of different type the efficiency of the proposed method is evaluated to be more than 98%. The
presented method is also capable of updating its databases taking the advantage of the feedbacks received
from the users.
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
Email systems have suffered from degraded quality of service due to rampant spam, phishing and fraudulent emails. This is partly because the classification speed of email filtering systems falls far behind the requirements of email service providers. We are motivated to address this issue from the perspective of computer architecture support. In this paper, as the first step towards novel architecture designs, we present extensive performance data collected from measurement and profiling experiments using representative email filtering systems including CRM114, DSPAM, SpamAssassin and TREC Bogofilter. We provide detailed analysis of the time consuming functions in the systems under study. We also show how the processor architecture parameters affect the performance of these email filters through simulation experiments.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
Spam filtering by using Genetic based Feature SelectionEditor IJCATR
Spam is defined as redundant and unwanted electronica letters, and nowadays, it has created many problems in business life such as
occupying networks bandwidth and the space of user’s mailbox. Due to these problems, much research has been carried out in this
regard by using classification technique. The resent research show that feature selection can have positive effect on the efficiency of
machine learning algorithm. Most algorithms try to present a data model depending on certain detection of small set of features.
Unrelated features in the process of making model result in weak estimation and more computations. In this research it has been tried
to evaluate spam detection in legal electronica letters, and their effect on several Machin learning algorithms through presenting a
feature selection method based on genetic algorithm. Bayesian network and KNN classifiers have been taken into account in
classification phase and spam base dataset is used.
This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text-dependent speaker identification system. It contains three main modules: endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short-Term Energy analysis is used for endpoint detection. Mel-frequency Cepstrum Coefficients (MFCC) is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization (VQ) approach using Linde, Buzo and Gray (LBG) clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung"A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18274.pdf http://www.ijtsrd.com/other-scientific-research-area/other/18274/a-robust-speaker-identification-system/zaw-win-aung
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of commun
ication so are the attempts to misuse it to
take undue advantage of its low cost and high reach
ability. However, as email communication
is very cheap, spammers are taking advantage of it
for advertising their products, for
committing cybercrimes. So, researchers are working
hard to combat with the spammers. Many
spam detections techniques and systems are built to
fight spammers. But the spammers are
continuously finding new ways to defeat the existin
g filters. This paper describes the existing
spam filters techniques and proposes a multi-level
architecture for spam email detection. We
present the analysis of the architecture to prove t
he effectiveness of the architecture
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in
fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data,
incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard
challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier
with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant
features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in
fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data,
incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard
challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier
with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant
features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
Electronic mail, commonly known as email, is a crucial technology that enables streamlined operations and communications in corporate environments. Empowering swift and dependable transactions, email is a driving force behind heightened productivity and organizational effectiveness. However, its versatility also renders it susceptible to misuse by cybercriminals engaging in activities such as hacking, spoofing, phishing, email bombing, whaling, and spamming. As a result, effective and efficient data analysis is important in avoiding and detecting cyber-attacks and crime on times. To overcome the above challenges, a novel approach named Aquila Optimization (AO) is used in this paper to find the best set of hyperparameters of the Stacked Auto Encoder (SAE) classifier. The purpose of increasing the hyperparameters of the SAE using the AO is to obtain a higher text classification accuracy. Then the optimized SAE classifies the selected features into different classes. The experimental results showed that the proposed AO-SAE model outperforms the existing models such as Logistic Regression (LR) and Long Short-Term Model based Gated Current Unit (LSTM based GRU) in terms of Accuracy.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective and fast method to share and information exchangeall over the world. In recent years, emails users are facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam message. In this paper, we present a new approach of spam filtering technique which combines RBFNN and Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental results show that our approach is performed in accuracy compared with other approaches that use the same dataset.
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective and fast method to share and information exchangeall over the world. In recent years, emails users are facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within each iterative process of PSO depending the fitness (error) function. The experiments are conducted on spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental results show that our approach is performed in accuracy compared with other approaches that use the same dataset.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This study focused on the practice of using computing resources more efficiently while maintaining or increasing overall performance. Sustainable IT services require the integration of green computing practices such as power management, virtualization, improving cooling technology, recycling, electronic waste disposal, and optimization of the IT infrastructure to meet sustainability requirements. Studies have shown that costs of power utilized by IT departments can approach 50% of the overall energy costs for an organization. While there is an expectation that green IT should lower costs and the firm’s impact on the environment, there has been far less attention directed at understanding the strategic benefits of sustainable IT services in terms of the creation of customer value, business value and societal value. This paper provides a review of the literature on sustainable IT, key areas of focus, and identifies a core set of principles to guide sustainable IT service design.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Identification of Spam Emails from Valid Emails by Using Voting
1. International Journal of Computer Applications Technology and Research
Volume 5– Issue 3, 132 - 136, 2016, ISSN:- 2319–8656
www.ijcat.com 132
Identification of Spam Emails from Valid Emails by
Using Voting
Hamoon Takhmiri
Computer Science and Technology
Islamic Azad University Kish International Branch
Kish Island, Iran
Ali Haroonabadi
Islamic Azad University Kish International Branch
Kish Island, Iran
Abstract: In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
Keywords: Spam emails; tree decision; Naïve Bayes; support vector machine; voting algorithm
1. INTRODUCTION
In recent years, the increasing use of e-mails has led to the
emergence and increase of the problems caused by unwanted
bulk email messages commonly known as spam. By changing
the common and aggressive content of some of these
messages from a minor harassment to a major concern, spams
began to reduce the reliability of emails. Personal users and
companies that are affected by spams because of network
bandwidth spent a lot of time for receiving these messages
and distinguishing spam messages from standard messages
(legally certified) for users. A business model based on spams
for buying and selling is usually beneficial because the costs
for the sender is little, so a lot of messages can be sent by
maximizing replies and this aggressive performance is a
feature of known spammers. Economic functions of spams
have forced some countries to legislate for them. In addition,
problems of pursuing the transmitters of these messages can
limit the performance of such laws. In addition to legislation,
some institutions have imposed changes in protocols and
practical models. Another approach being implemented is
using spam classifiers which by analyzing message content
and additional information try to identify spam messages. For
using this function once, messages are identified usually
based on the settings used by classifier [1]. If classifiers are
used by a single user, as a customer-focused classifier,
messages are usually sent to a folder that only contain
messages under the title of spam and this makes it easier to
identify the messages. On the other hand, if the classifier
works on the Email server, by checking multiple users’
messages, they can be tagged as spam or get deleted. Another
possibility is a multi-user setting in which classifiers running
on different machines share information about the received
messages to improve their performance. Generally, the use of
classifiers has created an evolutionary scenario in which
spammers use instruments with different ways, in particular,
regulated methods for reducing the number of messages
identified. Initially, spam classifiers were based on the user's
known laws and were designed on the basis of arrangements
that are easily seen in such messages. [2].
2. RELATED WORK
Filters have been relied on key word patterns. In order to be
efficient and avoid the risk of accidental deletion, non-spam
messages are called as legitimate messages or Ham. These
patterns should be checked manually by each user’s email.
However, fine adjustment of patterns requires time and
expertise which is unfortunately not always available. Even
messages’ features change over time which requires key word
patterns to be updated regularly [3]. Thus, processing
messages and detecting spams or non-spams automatically is
desirable. It should be noted that text categorization methods
can be effective in anti-spam filtering. Sending bulk
unsolicited message makes it a spam message, not its real
content. In fact, posting a bulk message carelessly makes the
message a spam. Phenomena can be images, sounds, or any
other data, but important thing is to distinguish between
different samples and have a good reaction for every sample.
Learning is usually used in one of the following ways:
Statistical, synthetic or neural [4].
Recognition of Statistical pattern by assuming that these
patterns are created by a possible system is determined based
on the statistical properties of patterns. Some of important
reasons for spamming include economic purposes, as well as
promoting a product, service or a particular idea, tricking
users to use their confidential information, transmission of
malicious software to the user's computer, creating a
temporary email server crash, generating traffic and
broadcasting immoral contents. Spams are constantly
changing their content and form to avoid detection by anti-
spams. Some ways to prevent spamming include:
• Economic methods: getting cash to send e-mail: Zmail
Protocols
• Legislative procedures: such as CAN-SPAM laws, securing
email transmission platform
• Changing the e-mail transmission protocol and providing
alternative protocols such as sending id and features
• Controlling your outgoing emails versus controlling
incoming mails.
2. International Journal of Computer Applications Technology and Research
Volume 5– Issue 3, 132 - 136, 2016, ISSN:- 2319–8656
www.ijcat.com 133
• filtering based on learning (statistical) and by using the mail
features
• Mail detection phishing (fake pages) to help fuzzy
classification methods
• Controlling methods: Controlling the features of a mail
before sending it by e-mail server
3. SUGGESTED METHOD
For better identification of spams, the aim is initially finding
behavioral features of spam, so the data mining and logging of
spam behavior is required at first, such as detecting the IP of
sender, sending time, frequency, number of attachments and
so on. This information is stored in the database so that they
are structure information. Behavioral attributes of spams can
be extracted from these reports created in the mail server [5].
Before extracting data, analyzing the features of e-mails from
reports is required. Information obtaining technology is
selected to analyze these features and the main feature is
obtained. Some features of fewer information and weaker
relations are deleted. The behavioral feature of an individual
e-mail is as follows:
Customer IP (CIP)
Receive Time (RT)
Context Length (CL)
Frequency (FRQ)
Context Type (CT)
Protocol Validation (PV)
Receiver Number (RN)
Attachment Number (AN)
Server IP (SIP)
There are no entirely clear features in real world, so that after
fuzzification, normal and logical degrees below horizon can
be explained for characterization of samples. After
preprocessing, the value of information is as follows:
A) Customer IP: is used only to calculate the frequency of
sender and extracting the common behavioral pattern of
sender and is not involved in the decision tree computing.
B) Receive Time: the value of day and night time is a
common value and requires fuzzification for horizontal degree
(0, 1).
C) Context Length: the value of short and long for the size of
email is also a common value feature and requires
fuzzification too.
D) Protocol Validation: is Boolean and when it complies with
the sender is (1) and in case of non-compliance is (0).
E) Context Type (CT): The value a text or Multipart is (1) for
the text and (0) for the Multipart.
F) Receiver Number (RN): the value of many and less is a
common value feature and also requires fuzzification.
G) Frequency (FRQ): the value of often and seldom frequency
is a common value feature and also requires fuzzification.
H) Attachment Number (AN): the value of many and less is a
common value feature and also requires fuzzification. Table 1
lists some examples of the results after preprocessing.
Table 1 Result Of Data Processing
It is assumed that (A, B) are fuzzy subsets defined in a
confined space (F). “If A so B” is named as a fuzzy rule and
simply recorded as (A → B), which is called fuzzy sets of
conditions, and (B) is called fuzzy sets of conclusion. Based
on the knowledge of decision tree, rules were classified as
fuzzy and are in the form of "if - then". A rule is created for
each path from the root to the leaves. Simultaneously with a
special path, any value features are as a pair of (And) piece of
a rule that is called prior rule. (Section If) predicts leaf node
classification, so forms the compliance rule. (Section Then) of
"if - then" rule are easier to understand, especially when the
tree is large [6].
Figure 1 Decision tree
After checking decision trees and identifying important
characteristics of a mail from the proposed decision tree in
Figure 1 decision tree, Mamdani decision tree rules are
generated as follows:
1. If the protocol (PV) of sending email is not authentic, the
email is spam.
2. If e-mail protocol (PV) is valid, frequency (FRQ) is less
and type (CT) is Multipart, the e-mail is spam.
3. International Journal of Computer Applications Technology and Research
Volume 5– Issue 3, 132 - 136, 2016, ISSN:- 2319–8656
www.ijcat.com 134
3. If e-mail protocol (PV) is valid, frequency (FRQ) is many
and context length (CL) is long, the e-mail is spam.
4. If e-mail protocol (PV) is valid, frequency (FRQ) is many,
context length (CL) is short and receiver number (RN) is
many, the e-mail is spam.
To explain the Naïve Bayes method, this example is discussed
as follows:
Figure 2 Separating non-spam mails from spams
P(Ci) : P(Class=Spam) = S / N
P(Ci) : P(Class=Ham) = H / N
The total occurrence of the word Free in spam
After obtaining the threshold (X1, X2), if the number of
occurrences of the word Free in 21st spam email is closed to
the X1, that mail is spam and if it is closed to X2, it is HAM.
The same operations on other components can be checked [7].
Using SVM (Linear support vector machine) in classification
issues is a new approach that in recent years has become an
attractive subject for many and is used in a wide range of
applications, including OCR, handwriting recognition, signs
recognition and so on. SVM approach is that in the training
phase, they try to maintain the decision boundary in such a
way that its minimum distance to each of the considered
categories becomes maximum distance. This choice makes the
decision practically to tolerate the noisy environment and
have a good response. This boundary selection method is
based on the points called support vectors. Thus in the
training phase, general characteristics of spam are extracted
according to the data analysis obtained from data collection
and training is done based on it. Then testing phase was
performed based on the mentioned cases and each time
compared with original data until the results became
optimized [8]. The proposed method is as follows: first
measurement criteria for spam are determined which contains
implicit (non-material) and explicit (content). Implicit cases
that are analyzed by decision tree include protocol type,
content length, content type, time, frequency, receiver
number, etc. explicit cases are determined by Naïve Bayes
and support vector, such as the repetition of words Victory,
Win, three zeros in a row, and so on. In fact, the desired
dataset is a combination of implicit properties inside the
decision trees and explicit characteristics used in Naïve Bayes
method and vector machine and the results of all three
methods are surveyed by voting. In other words, the implicit
characteristics of the data set were analyzed by decision tree
algorithm and the obtained results are completed through
fuzzy - Mamdani rules [9]. Then explicit characteristics in
Naïve Bayes rule and support vector machine are evaluated
and the results of all three methods are surveyed by voting.
Each of the mails inside the data collection are entered into
Naïve Bayes, support vector machine and decision tree. If at
least two of the three proposed algorithms were determined
correctly, or in other words, at least two of the three proposed
algorithms are like minded, the result is acceptable. In this
method, results are divided in two groups: high reliability for
all three algorithms being likeminded and average reliability
for two of the three proposed algorithms being likeminded.
Ultimately in the final test, data set was divided into four parts
by K-fold method and the first quarter was analyzed for
testing and the rest for learning; next, the second quarter of
data set was analyzed for testing and first, third and fourth
quarters were analyzed for learning; then the third quarter was
analyzed for testing and first, second and fourth quarters were
analyzed for learning; also the fourth quarter was considered
for testing and first, second and third quarters were considered
for learning [10].
4. RESULT AND DISCUSSION
Dataset for implementing the proposed method contain 1000
emails in which 300 (30%) are spam and 700 (70 %) are non-
spam. In the last column of this data set, there is a (Class)
column and inserting number 1 in this column means spam
and inserting zero means non-spam for every mail. Examples
of keywords for the explicit section regarding the
implementation of the Naïve Bayes method and support
vector machine include:
Victory, Money, Win, Lottery, 000, ###,…
Another section of this data set containing the implicit
characteristics may be used to implement the decision trees,
including:
Sending time, type of text, text length, frequency, receiver
number, sender number and...
The purpose of testing the proposed data set is to assess the
accuracy of detection of the proposed method and showing
better spam detections compared with Naïve Bayes method,
support vector machine, or decision tree [11]. After analyzing
the data set inside the Naïve Bayes method and extracting
4. International Journal of Computer Applications Technology and Research
Volume 5– Issue 3, 132 - 136, 2016, ISSN:- 2319–8656
www.ijcat.com 135
efficiency and accuracy percentages and bright-dark spots,
that same data set inside the decision tree and support vector
machine is analyzed and its accuracy and efficiency
percentages is calculated and the results will be analyzed by
voting method. Then results with the like-mindedness of at
least two of the three proposed algorithms algorithm are
determined as the final results. To demonstrate the efficiency,
f the proposed method is performed by one of the methods
discussed [12]. F-Measure and accuracy criteria were
compared so that the testing data set is divided into ten parts
and hundred, two hundreds, three hundreds to thousand mails
were tested. The results will be compared with the results of
spam swarm optimization method that include negative
selection algorithm and particle swarm optimization on
measurement and accuracy metrics.
Figure 3 Accuracy Compare Between Methods
Figure 4 F-Measure
5. CONCLUSION
This method provides a new approach to detect spam using a
combination of decision tree, Naïve Bayes method, support
vector machine method and surveying by voting to extract the
behavioral patterns of spam. Since there are no entirely clear
features in the real world, degrees below horizon is normal
and reasonable to explain the behavioral characteristics [13].
Decision tree begins identifying the mail and spams in dataset
using fuzzy - Mamdani rules. Then Naive Bayes method is
performed on the selected data set through Bayes formula of
this operation. Then support vector machine algorithm
analyzes the explicit data and voting method by the obtained
percentages from all three methods and fining like-
mindedness of at least two of the three proposed algorithms,
shows the accuracy of detection. Proposed method not only
shows a better performance compared with the independent
use of each of the three methods, but divide’s the detection
into two groups of almost reliable and very reliable by using
the majority of votes in detecting spam and normal mails
(common TP and TN of all three methods).
One of the most important things in determining the optimal
method to detect spam is to minimize the number of non-
spams mails that are known as spam, since finding and
deleting spams between safe e-mails known is simple for the
user while finding a safe mail between spams is usually
difficult and time consuming. To improve the accuracy of
spam detection results, three methods were used and better
statistics were provided for spam detection through voting
method. Comparing the proposed approach with some of
previous methods that have already been done show a better
performance regarding the accuracy of results. Adding a fuzzy
preprocessing level for processing email contents for user was
done by using mail classifications into categories based on
content, subject, sender, sending time, receiver number,
sender number, etc. and the integration of Naïve Bayes
method, decision tree and support vector machine based on
implicit and explicit components of a mail for all the three
methods and classification was done by voting. Using false
positive and negative rates increased the accuracy of statistical
filters for spam detection accuracy and lowered detection
error rate..
6. RECOMMENDATIONS AND
FUTURE WORK
To improve the performance of proposed method in the
future, more details can be evaluated by the development of
decision tree. For example, more detailed non-content cases
including: sending time, sending protocol, content length,
content type, time zone, receiver number, frequency and
number of attachments can increase the accuracy of decision
tree in spam detection. For Naïve Bayes and support vector
machine methods, adding content details, some parts of a mail
such as: subject, content, sender, keywords and user interests,
results in improved performance of these two methods
regarding the classification of spam and non-spam mails.
Finally, using all the three methods in voting algorithm will
show a better efficiency percentage. More K-fold divider and
learning percentage increases proposed method’s accuracy of
detection. In other words, K-fold amount and learning
percentage considered for the proposed method have a direct
relationship with the accuracy of detection. Of course, if
details and K-fold amount exceeds a certain extent,
implementing the method will be more complex.
7. REFERENCES
[1]. Wu, C.T., Cheng, K.T., Zhu, Q., and Wu, Y.L., 2008,
“Using Visual Features For Anti-Spam Filtering”, In
Proceedings of the IEEE International Conference on Image
Processing, Vol. 29, Iss. 1, pp. 63-92.
5. International Journal of Computer Applications Technology and Research
Volume 5– Issue 3, 132 - 136, 2016, ISSN:- 2319–8656
www.ijcat.com 136
[2]. Goodman, J., and Rounthwaite, R., 2004, “Stopping
Outgoing Spam”, In Proceedings of the 5th ACM Conference
on Electronic Commerce, pp. 30-39.
[3]. Siponen, M., and Stucke, C., 2006, “Effective Antispam
Strategies In Companies: An International Study”, In
Proceedings of the 39th IEEE Annual Hawaii International
Conference on Transaction on Spam Detection, Vol. 6, pp.
245-252.
[4]. Cody, S., Cukier, W., and Nesselroth, E., 2006, “Genres
Of Spam: Expectations And Deceptions”, In Proceedings of
the 39th Annual Hawaii International Conference on System
Sciences, Vol. 3, pp. 48-51.
[5]. Golbeck, J., and Hendler, J., 2006, “Reputation Network
Analysis For Email Filtering”, In Proceedings of the First
International Conference on Email and Anti-Spam, pp. 21-23.
[6]. Liang, Z., Jianmin, G., and Jian, H., 2012, “The Research
and Design of an Anti-open Junk Mail Relay System”, In
Proceedings of the First IEEE International Conference on
Computer Science and Service System, pp. 1258-1262.
[7]. Feamster, N., and Ramachandran, A., 2006,
“Understanding The Network-Level Behavior Of Spammers”,
In Proceeding of the 3th ACM Conference on Email and Anti-
Spam, Vol. 36, Iss. 4, pp. 291-302.
[8]. Lili, D., and Yun, W., 2011, “Research And Design Of
ID3 Algorithm Rules-Based Anti-Spam Email Filtering”, In
Proceedings of the Second IEEE International Conference on
Software Engineering and Service Science, pp. 572-575.
[9]. Zhitang, L., and Sheng, Z., 2009, “A Method for Spam
Behavior Recognition Based on Fuzzy Decision Tree”, In
Proceedings of the Ninth IEEE International Conference on
Computer and Information Technology , Vol. 2, pp. 236-241.
[10]. Duquenoy, P., Moustakas, E., and Ranganathan, E.,
2005, “Combating Spam Through Legislation: A Comparative
Analysis Of Us And European Approaches”, In Proceedings
of the Second International Conference on Email and Anti-
Spam,pp. 15-22.
[11]. Jones, L., 2007, “Good Times Virus Hoax FAQ”,
Available: http://cityscope.net/hoax1.html, [Accesed: Jul. 10,
2015].
[12]. Singhal, A., 2007, “An Overview Of Data Warehouse,
Olap And Data Mining Technology”, Springer Science
Business Media, LLC, Vol. 31, pp. 19-23.
[13]. Ismaila, I., and Selamat, A., 2014, “Improved Email
Spam Detection Model With Negative Selection Algorithm
And Particle Swarm Optimization”, Elsevier Journal of
Alliance and Faculty of Computing, Vol. 22, pp. 15-27.