The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an antispam
filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of communication so are the attempts to misuse it to
take undue advantage of its low cost and high reachability. However, as email communication
is very cheap, spammers are taking advantage of it for advertising their products, for
committing cybercrimes. So, researchers are working hard to combat with the spammers. Many
spam detections techniques and systems are built to fight spammers. But the spammers are
continuously finding new ways to defeat the existing filters. This paper describes the existing
spam filters techniques and proposes a multi-level architecture for spam email detection. We
present the analysis of the architecture to prove the effectiveness of the architecture.
This is the presentation for Machine Learning Assignment in Dublin City University for Spring 2017. In this Project, we made an email spam filtering code using Enron Dataset
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
In recent years, the increasing use of e-mails has led to the emergence and increase of problems caused by mass unwanted
messages which are commonly known as spam. In this study, by using decision trees, support vector machine, Naïve Bayes theorem
and voting algorithm, a new version for identifying and classifying spams is provided. In order to verify the proposed method, a set of
a mails are chosen to get tested. First three algorithms try to detect spams, and then by using voting method, spams are identified. The
advantage of this method is utilizing a combination of three algorithms at the same time: decision tree, support vector machine and
Naïve Bayes method. During the evaluation of this method, a data set is analyzed by Weka software. Charts prepared in spam
detection indicate improved accuracy compared to the previous methods.
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of communication so are the attempts to misuse it to
take undue advantage of its low cost and high reachability. However, as email communication
is very cheap, spammers are taking advantage of it for advertising their products, for
committing cybercrimes. So, researchers are working hard to combat with the spammers. Many
spam detections techniques and systems are built to fight spammers. But the spammers are
continuously finding new ways to defeat the existing filters. This paper describes the existing
spam filters techniques and proposes a multi-level architecture for spam email detection. We
present the analysis of the architecture to prove the effectiveness of the architecture.
This is the presentation for Machine Learning Assignment in Dublin City University for Spring 2017. In this Project, we made an email spam filtering code using Enron Dataset
Differential evolution detection models for SMS spam IJECEIAES
With the growth of mobile phones, short message service (SMS) became an essential text communication service. However, the low cost and ease use of SMS led to an increase in SMS Spam. In this paper, the characteristics of SMS spam has studied and a set of features has introduced to get rid of SMS spam. In addition, the problem of SMS spam detection was addressed as a clustering analysis that requires a metaheuristic algorithm to find the clustering structures. Three differential evolution variants viz DE/rand/1, jDE/rand/1, jDE/best/1, are adopted for solving the SMS spam problem. Experimental results illustrate that the jDE/best/1 produces best results over other variants in terms of accuracy, false-positive rate and false-negative rate. Moreover, it surpasses the baseline methods.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages
Spams are unwanted and also undesirable emails which are mass sent to the numerous victims. Further
penetration of spams into electronic processors and communication equipments such as computers and
mobiles as well as lack of control on the information shared on the internet and other communication
networks and also inefficiency of the spam detecting methods developed for Persian contexts are among the
main challenging issues of the Persian subscribers. This paper presents a novel and efficient method for
thematic identification of Persian spams. The proposed method is capable of identifying the Persian, spams
and also “Penglish” spams. “Penglish” is made up of two words Persian and English and demonstrates a
Persian text which is written by English alphabetic letters. Based on the experimental analysis of the 10000
spams of different type the efficiency of the proposed method is evaluated to be more than 98%. The
presented method is also capable of updating its databases taking the advantage of the feedbacks received
from the users.
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
GENDER AND AUTHORSHIP CATEGORISATION OF ARABIC TEXT FROM TWITTER USING PPMijcsit
In this paper we present gender and authorship categorisation using the Prediction by Partial Matching(PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an
implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and
authorship respectively.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A survey on Stack Path Identification and Encryption Adopted as Spoofing Defe...IOSR Journals
Abstract: Spoofing attacks are a constant nag in the information world, so many methodologies have been
invented to reduce on its effects but still there is a lot left to be desired. The kind of impact that this attacks have
on Electronic Payment Systems is so detrimental to the economic world given that this systems are viewed as
performance enhancers on payments. This study elaborates two methodologies a combination of StackPi and
Encryption as spoofing defense methodologies. Billions of shillings are lost in this rollercoaster thus giving rise
to a situation that deserves undivided attention and should be researched on. A profound argument on the
methodologies that have been used in this mission to eradicate spoofing attacks, the limitations that they posses
and other methodologies that have been brought in play to succeed them elicits an interesting he strategy of
integrating or combining methodologies and the benefits that this strategy contributes to curbing spoofing
attacks. With that knowledge underhand, it can be justified why the combination of Stack Pi and Encryption is a
recommended solution against spoofing attacks.
Keywords: Electronic Payment Systems, Encryption, Security, Spoofing attacks, Stack Pi
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective
and fast method to share and information exchangeall over the world. In recent years, emails users are
facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It
consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most
powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of
dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is
the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam
message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and
Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of
PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest
Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on
spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental
results show that our approach is performed in accuracy compared with other approaches that use the
same dataset.
With technological advancements and increment in Mobile Phones supported content advertisement, because the use of SMS phones has increased to a big level to prompted Spam SMS unsolicited Messages to users, on the complexity of reports the quality of SMS Spam is expanding step by step. These spam messages can lead loss of personal data as well. SMS spam detection which is relatively equal to a replacement area and systematic literature review on this area is insufficient. SMS detection are often dealed using various machine learning techniques which as a feature called SMS spam filtering which separates spam or ham . This Paper aims to match treats spam detection as a basic two class document classification problem. The Classification will comprise of classification algorithm with extractions and different dataset collected which uses a classification feature to filter the messages . In this web journal, we are going center on creating a Naïve Bayes show for spam message identification, and utilize flash as it could be a web benefit advancement micro framework in python to form an API for show. The Comparison has performed using machine learning and different algorithm techniques. Kavya P | Dr. A. Rengarajan, "A Comparative Study for SMS Spam Detection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38094.pdf Paper URL : https://www.ijtsrd.com/computer-science/other/38094/a-comparative-study-for-sms-spam-detection/kavya-p
Foodservice Consultant Q4 2016 - Water: The New Craft Beverage - Water is a growing priority for foodservice operators, but usually in terms of resource costs. New technology is promoting a different view. Bottled water is gathering greater prestige as a product yet with lower costs and higher margins, reports Jim Banks.
most Awaited New Launch SKA GREENARCH Noida Extension
Key Stats of SKA GREENARCH
1 Of the Best New Launch of 2016 with lowest Prices in Extension
2 Biggest Developers SKA & Green Arch Join hands in Extension
3 Side Open Plot with Undoubtedly Best Site Layout in Extension
4 High Speed Lift in Each Tower which is unmatched in Extension
5 Acres of Project Land with only 4 Towers Historical in Extension
100 Bookings will only be accepted in the introductory Price. Even a single unit beyond 100 will not be accepted with this price
Domaine Public Maritime de Lorient-Keroman. Recensement, cartographies, analy...AudéLor
AudéLor vous propose de (re)découvrir quelques travaux phares à travers une série de posters pédagogiques, synthétiques et illustrés, sur des sujets aussi variés que le littoral de Lorient et ses ambiances, les espaces économiques de la rade de Lorient, la silhouette urbaine de la presqu’île de Keroman et l’analyse du domaine public du port de Pêche de Lorient Keroman.
Libre à vous de les télécharger en format A3 et de les diffuser largement autour de vous !
Differential evolution detection models for SMS spam IJECEIAES
With the growth of mobile phones, short message service (SMS) became an essential text communication service. However, the low cost and ease use of SMS led to an increase in SMS Spam. In this paper, the characteristics of SMS spam has studied and a set of features has introduced to get rid of SMS spam. In addition, the problem of SMS spam detection was addressed as a clustering analysis that requires a metaheuristic algorithm to find the clustering structures. Three differential evolution variants viz DE/rand/1, jDE/rand/1, jDE/best/1, are adopted for solving the SMS spam problem. Experimental results illustrate that the jDE/best/1 produces best results over other variants in terms of accuracy, false-positive rate and false-negative rate. Moreover, it surpasses the baseline methods.
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.
Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages
Spams are unwanted and also undesirable emails which are mass sent to the numerous victims. Further
penetration of spams into electronic processors and communication equipments such as computers and
mobiles as well as lack of control on the information shared on the internet and other communication
networks and also inefficiency of the spam detecting methods developed for Persian contexts are among the
main challenging issues of the Persian subscribers. This paper presents a novel and efficient method for
thematic identification of Persian spams. The proposed method is capable of identifying the Persian, spams
and also “Penglish” spams. “Penglish” is made up of two words Persian and English and demonstrates a
Persian text which is written by English alphabetic letters. Based on the experimental analysis of the 10000
spams of different type the efficiency of the proposed method is evaluated to be more than 98%. The
presented method is also capable of updating its databases taking the advantage of the feedbacks received
from the users.
Analysis of an image spam in email based on content analysisijnlc
Researchers initially have addressed the problem of spam detection as a text classification or
categorization problem. However, as spammers’ continue to develop new techniques and the type of email
content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in
preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have recently
adopted the image spam trick to make the scrutiny of emails’ body text inefficient. The main idea behind
this project is to design a spam detection system. The system will be enabled to analyze the content of
emails, in particular the artificially generated image sent as attachment in an email. The system will
analyze the image content and classify the embedded image as spam or legitimate hence classify the email
accordingly.
GENDER AND AUTHORSHIP CATEGORISATION OF ARABIC TEXT FROM TWITTER USING PPMijcsit
In this paper we present gender and authorship categorisation using the Prediction by Partial Matching(PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an
implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and
authorship respectively.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A survey on Stack Path Identification and Encryption Adopted as Spoofing Defe...IOSR Journals
Abstract: Spoofing attacks are a constant nag in the information world, so many methodologies have been
invented to reduce on its effects but still there is a lot left to be desired. The kind of impact that this attacks have
on Electronic Payment Systems is so detrimental to the economic world given that this systems are viewed as
performance enhancers on payments. This study elaborates two methodologies a combination of StackPi and
Encryption as spoofing defense methodologies. Billions of shillings are lost in this rollercoaster thus giving rise
to a situation that deserves undivided attention and should be researched on. A profound argument on the
methodologies that have been used in this mission to eradicate spoofing attacks, the limitations that they posses
and other methodologies that have been brought in play to succeed them elicits an interesting he strategy of
integrating or combining methodologies and the benefits that this strategy contributes to curbing spoofing
attacks. With that knowledge underhand, it can be justified why the combination of Stack Pi and Encryption is a
recommended solution against spoofing attacks.
Keywords: Electronic Payment Systems, Encryption, Security, Spoofing attacks, Stack Pi
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
Email is one of the most popular communication media in the current century; it has become an effective
and fast method to share and information exchangeall over the world. In recent years, emails users are
facing problem which is spam emails. Spam emails are unsolicited, bulk emails are sent by spammers. It
consumes storage of mail servers, waste of time and consumes network bandwidth.Many methods used for
spam filtering to classify email messages into two groups spam and non-spam. In general, one of the most
powerful tools used for data classification is Artificial Neural Networks (ANNs); it has the capability of
dealing a huge amount of data with high dimensionality in better accuracy. One important type of ANNs is
the Radial Basis Function Neural Networks (RBFNN) that will be used in this work to classify spam
message. In this paper, we present a new approach of spam filtering technique which combinesRBFNN and
Particles Swarm Optimization (PSO) algorithm (HC-RBFPSO). The proposed approach uses PSO
algorithm to optimize the RBFNN parameters, depending on the evolutionary heuristic search process of
PSO. PSO use to optimize the best position of the RBFNN centers c. The Radii r optimize using K-Nearest
Neighbors algorithmand the weights w optimize using Singular Value Decomposition algorithm within
each iterative process of PSO depending the fitness (error) function. The experiments are conducted on
spam dataset namely SPAMBASE downloaded from UCI Machine Learning Repository. The experimental
results show that our approach is performed in accuracy compared with other approaches that use the
same dataset.
With technological advancements and increment in Mobile Phones supported content advertisement, because the use of SMS phones has increased to a big level to prompted Spam SMS unsolicited Messages to users, on the complexity of reports the quality of SMS Spam is expanding step by step. These spam messages can lead loss of personal data as well. SMS spam detection which is relatively equal to a replacement area and systematic literature review on this area is insufficient. SMS detection are often dealed using various machine learning techniques which as a feature called SMS spam filtering which separates spam or ham . This Paper aims to match treats spam detection as a basic two class document classification problem. The Classification will comprise of classification algorithm with extractions and different dataset collected which uses a classification feature to filter the messages . In this web journal, we are going center on creating a Naïve Bayes show for spam message identification, and utilize flash as it could be a web benefit advancement micro framework in python to form an API for show. The Comparison has performed using machine learning and different algorithm techniques. Kavya P | Dr. A. Rengarajan, "A Comparative Study for SMS Spam Detection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38094.pdf Paper URL : https://www.ijtsrd.com/computer-science/other/38094/a-comparative-study-for-sms-spam-detection/kavya-p
Foodservice Consultant Q4 2016 - Water: The New Craft Beverage - Water is a growing priority for foodservice operators, but usually in terms of resource costs. New technology is promoting a different view. Bottled water is gathering greater prestige as a product yet with lower costs and higher margins, reports Jim Banks.
most Awaited New Launch SKA GREENARCH Noida Extension
Key Stats of SKA GREENARCH
1 Of the Best New Launch of 2016 with lowest Prices in Extension
2 Biggest Developers SKA & Green Arch Join hands in Extension
3 Side Open Plot with Undoubtedly Best Site Layout in Extension
4 High Speed Lift in Each Tower which is unmatched in Extension
5 Acres of Project Land with only 4 Towers Historical in Extension
100 Bookings will only be accepted in the introductory Price. Even a single unit beyond 100 will not be accepted with this price
Domaine Public Maritime de Lorient-Keroman. Recensement, cartographies, analy...AudéLor
AudéLor vous propose de (re)découvrir quelques travaux phares à travers une série de posters pédagogiques, synthétiques et illustrés, sur des sujets aussi variés que le littoral de Lorient et ses ambiances, les espaces économiques de la rade de Lorient, la silhouette urbaine de la presqu’île de Keroman et l’analyse du domaine public du port de Pêche de Lorient Keroman.
Libre à vous de les télécharger en format A3 et de les diffuser largement autour de vous !
Ias sl atrack-guard tour monitoring ver1indusaviation
SLA-Track a Guard tour Monitoring System developed by Indus Aviation Systems , SLA-Track will help you to secure your Property area and valuable assets . Guard tour monitoring systems ensures proof of presence while Patrolling.
A multi layer architecture for spam-detection systemcsandit
As the email is becoming a prominent mode of commun
ication so are the attempts to misuse it to
take undue advantage of its low cost and high reach
ability. However, as email communication
is very cheap, spammers are taking advantage of it
for advertising their products, for
committing cybercrimes. So, researchers are working
hard to combat with the spammers. Many
spam detections techniques and systems are built to
fight spammers. But the spammers are
continuously finding new ways to defeat the existin
g filters. This paper describes the existing
spam filters techniques and proposes a multi-level
architecture for spam email detection. We
present the analysis of the architecture to prove t
he effectiveness of the architecture
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
Electronic mail, commonly known as email, is a crucial technology that enables streamlined operations and communications in corporate environments. Empowering swift and dependable transactions, email is a driving force behind heightened productivity and organizational effectiveness. However, its versatility also renders it susceptible to misuse by cybercriminals engaging in activities such as hacking, spoofing, phishing, email bombing, whaling, and spamming. As a result, effective and efficient data analysis is important in avoiding and detecting cyber-attacks and crime on times. To overcome the above challenges, a novel approach named Aquila Optimization (AO) is used in this paper to find the best set of hyperparameters of the Stacked Auto Encoder (SAE) classifier. The purpose of increasing the hyperparameters of the SAE using the AO is to obtain a higher text classification accuracy. Then the optimized SAE classifies the selected features into different classes. The experimental results showed that the proposed AO-SAE model outperforms the existing models such as Logistic Regression (LR) and Long Short-Term Model based Gated Current Unit (LSTM based GRU) in terms of Accuracy.
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
Email systems have suffered from degraded quality of service due to rampant spam, phishing and fraudulent emails. This is partly because the classification speed of email filtering systems falls far behind the requirements of email service providers. We are motivated to address this issue from the perspective of computer architecture support. In this paper, as the first step towards novel architecture designs, we present extensive performance data collected from measurement and profiling experiments using representative email filtering systems including CRM114, DSPAM, SpamAssassin and TREC Bogofilter. We provide detailed analysis of the time consuming functions in the systems under study. We also show how the processor architecture parameters affect the performance of these email filters through simulation experiments.
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...IJNSA Journal
Unsolicited Bulk Emails (also known as Spam) are undesirable emails sent to massive number of users. Spam emails consume the network resources and cause lots of security uncertainties. As we studied, the location where the spam filter operates in is an important parameter to preserve network resources. Although there are many different methods to block spam emails, most of program developers only intend to block spam emails from being delivered to their clients. In this paper, we will introduce a new and efficient approach to prevent spam emails from being transferred. The result shows that if we focus on developing a filtering method for spams emails in the sender mail server rather than the receiver mail server, we can detect the spam emails in the shortest time consequently to avoid wasting network resources.
E-Mail Security Using Spam Mail Detection and Filtering Systemrahulmonikasharma
Electronic mail, also known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Email is the most efficient way to communicate or transfer our data from one to another. While transferring or communicating through email there is the possibility of misbehave. In the existing system Spam method is used to avoid the unwanted Email receiving. Email spam, also known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted email messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. But in Spam method there is no way to prevent the unwanted messages or Email receiving. To solve these unwanted messages or Email receiving we propose the concept Email misbehave blocking system. In the proposed method we permanently prevent the incoming unwanted messages or Email through blocking system.
E-Mail Security Using Spam Mail Detection and Filtering Systemrahulmonikasharma
Electronic mail, also known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Email is the most efficient way to communicate or transfer our data from one to another. While transferring or communicating through email there is the possibility of misbehave. In the existing system Spam method is used to avoid the unwanted Email receiving. Email spam, also known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted email messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. But in Spam method there is no way to prevent the unwanted messages or Email receiving. To solve these unwanted messages or Email receiving we propose the concept Email misbehave blocking system. In the proposed method we permanently prevent the incoming unwanted messages or Email through blocking system.
lectronic-mail is widely used most suitable method of transferring messages electronically from one
person to another, rising from and going to any part of the world. Main features of Electronic mail is its speed,
dependability, well-equipped storage options and a large number of added services make it highly well-liked
among people from all sectors of business and society. But being popular it also has negative side too. Electronics
mails are preferred media for a large number of attacks over the internet.. A number of the most popular attacks over
the internet include spams. Some methods are essentially in detection of spam related mails but they have higher false
positives. A number of filters such as Checksum-based filters, Bayesian filters, machine learning based and
memory-based filters are usually used in order to recognize spams. As spammers constantly try to find a way to
avoid existing filters, a new filters need to be developed to catch spam. This paper proposes to find an
resourceful spam mail filtering method using user profile base ontology. Ontologies permit for machineunderstandable
semantics of data. It is main to interchange information with each other for more efficient spam
filtering. Thus, it is essential to build ontology and a framework for capable email filtering. Using ontology that is
particularly designed to filter spam, bunch of useless bulk email could be filtered out on the system. We propose a
user profile-based spam filter that classifies email based on the likelihood that User profile within it have been
included in spam or valid email.
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This study focused on the practice of using computing resources more efficiently while maintaining or increasing overall performance. Sustainable IT services require the integration of green computing practices such as power management, virtualization, improving cooling technology, recycling, electronic waste disposal, and optimization of the IT infrastructure to meet sustainability requirements. Studies have shown that costs of power utilized by IT departments can approach 50% of the overall energy costs for an organization. While there is an expectation that green IT should lower costs and the firm’s impact on the environment, there has been far less attention directed at understanding the strategic benefits of sustainable IT services in terms of the creation of customer value, business value and societal value. This paper provides a review of the literature on sustainable IT, key areas of focus, and identifies a core set of principles to guide sustainable IT service design.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Identifying Valid Email Spam Emails Using Decision Tree
1. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 61 - 65, 2016, ISSN:- 2319–8656
www.ijcat.com 61
Identifying Valid Email Spam Emails Using Decision Tree
Hamoon Takhmiri
Computer Science and Technology
Islamic Azad University
Kish International Branch
Kish Island, Iran
Ali Haroonabadi
Islamic Azad University
Kish International Branch
Kish Island, Iran
Abstract: The increasing use of e-mail and the growing trend of Internet users sending unsolicited bulk e-mail, the need for an anti-
spam filtering or have created, Filter large poster have been produced in this area, each with its own method and some parameters are
to recognize spam. The advantage of this method is the simultaneous use of two algorithms decision tree ID3 - Mamdani and Naive
Bayesian is fuzzy. The first two algorithms are then used to detect spam Bagging approach is to identify spam. In the evaluation of this
dataset contains a thousand letters have been analyzed by the software Weka charts provided in spam detection accuracy than previous
methods of improvement.
Keywords: Spam; Fuzzy Decision Tree; ID3 Algorithm; Naive Bayesian; Anti-Spam
1. INTRODUCTION
Today, the problem of unintended emails called spam is
turned to a serious problem that 80% of these unintended
emails refer to spams. Spams make a lot of problems, in other
words spams cause the creation of traffic and destroy storage
space and authority. Spams cause that users spend a lot of
time to divide and clean unintended emails and also cause
users' feeling of lack of security. Spams cause some illegal
problems such as pornography, pyramidal schemes and
economic scams such as phishing sites. In recent years, the
increasing popularity and low cost of emails have attracted the
attention of direct marketing so that with a promise of
winning in lottery and getting valuable prizes, they deceive
users. Large lists of email addresses, usually are taken from
web pages and archives of news groups, make it possible to
send unintended emails to a thousand of receivers without any
costs. Users receive large amount of spams that contain
anything from holidays to projects of getting wealthy. The
term unintended commercial email is used in books too[1].
Spam is used in a wider sense. Spams are annoying for most
users because it wastes their time and unsettle their inbox.
They also waste users' money by dialing connections, reduce
bandwidth and maybe show unimportant subjects with
inappropriate contents such as propaganda of vulgar sites.
Ferris research institute estimated that economic losses
resulting from unintended emails and spams have been over
50 million dollar [2].
2. RELATED WORK
Filters have usually relied on keyword patterns, to be more
efficient and prevent the danger of accidental removal of ham
messages which are called Ham or allowed messages. These
patterns need to be checked with each user's received emails.
However, detailed setting of such patterns needs time and
proficiency which are unfortunately not always available [3].
Even characteristics of messages will change by the pass of
time and need updating of keyword patterns. So, automatic
processing of spam messages and allowed messages that have
already been received is desirable. Note that text
categorization methods can be effective in anti-spam filtering.
Unlike most programs of text categorization, indiscriminate
mass operation is an unintended message that makes it as
spam. The phenomenon can be images, sounds or any other
data. The point is that to be able to distinguish between
different samples and react based on the type of each sample.
Learning usually happens based on one of the following
methods: statistically, combination, or neural.
Realizing statistical pattern by assuming that patterns are
made based on a random system, is determined based on
statistical characteristics of the patterns. Some of the most
important reasons of sending spams are economic goals and
also advertising for a product, a service or a special idea,
deceiving users to use their private information, transmission
of a malicious software to the users' computers, creating a
temporary failure in email server, making traffic and
broadcasting immoral contents [4].
Spams are always changing their contents and forms, so that
the anti-spams can't realize them. Some methods to prevent
propagation of spams are including:
- economic methods: pay to send emails: like email
protocols
legislative methods: such as can-spam law, secure email
transfer bed.
- change email transfer protocols and offer alternative
protocols such as sending ID.
- control output and input emails
- filtering based on learning (statistics) by using mail
features
- detecting a phishing mail (fraud page) by the help of
fuzzy classification methods
3. SUGGESTED METHOD
To detect spams better, the first goal is finding behavioral
characteristics of the spam, so we need the extraction of data
and registration of events of spam's behavior like sender's IP,
sending time, amplitude and etc. which are shown in table 1
These data are stored in database, so they are structural data
[5].
We can extract the behavioral characteristics of spams from
their mail servers. Before the extraction of data, we need the
2. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 61 - 65, 2016, ISSN:- 2319–8656
www.ijcat.com 62
analysis of characteristics of emails from their reports.
Obtaining data technology is chosen to analyze these
characteristics, then the main characteristic is obtained and
characteristics with less data and weaker connection are
deleted. Behavioral features and characteristics of a single
email is as follows:
- Customer IP ( CIP )
- Receive time ( RT )
- Context Length ( CL )
- Frequency ( FRQ )
- Context Type ( CT )
- Protocol Validation ( PV )
- Receiver Number ( RN )
- Attach number ( AN )
- Server IP ( SIP )
Table 1 Mail Log Format
Features do not exist entirely clear in real world to explain
making character for the samples logically and naturally. Data
value after preprocessing is as follows:
A) Customer IP (CIP): is used only to calculate the frequency
of the transmitter and to extract common pattern of
transmitter's behavior, and is not used in calculations of
decision tree.
B) Reaching Time (RT): the value of time of day and night is
a common value and needs fuzzy making for the degree of
transverse (1,0).
C) Context Length (CL): short value, long value and the size
of the email are common values and need fuzzy making.
D) Protocol Validation (PV): is Boolean type and when
matches with the sender (1) and in case of mismatch (0).
E) Context Type (CT): value in text/Html, multipart. (1) and
when tye is text (0).
F) Receiver Number (RN): more value and less value, is a
feature of common value and needs fuzzy making.
G) Frequency (FRQ): often or seldom frequency is a feature
of common value and needs fuzzy making.
H) Attachment number (AN): more and less value, is a feature
of common value and needs fuzzy making. Table 2 lists some
examples of after preprocessing results.
Table 2 Attributes From Mail Logs
Assuming that (A,B) are defined fuzzy subsets in a limited
space (F). If A and B are named as a fuzzy rule and recorded
as (A→ B) and named as fuzzy condition sets, so B is called
fuzzy conclusion sets. The presented knowledge of each fuzzy
decision tree shows that the rules are classified as (if - then).
For each path from root to leaves, a rule and a specific path
are made. Each value of features is a pair of a part of the piece
(and) of a law which is called prior law. The IF part predicts
the node of the classification leaf, and so makes the following
law (then part). Laws of if-then are for easier understanding,
especially when the tree is big[6].
Figure 1 Decision Tree
After examining the decision tree and identifying important
features of a mail by the proposed decision tree in figure 1,
mamdani’s generated decision tree rules are as follows:
3. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 61 - 65, 2016, ISSN:- 2319–8656
www.ijcat.com 63
1) If the protocol (PV) of email is not reliable, then the email
is a spam.
2) If the protocol of email (PV) is valid, context length (CL) is
large and context type (CT) is multipart, then the email is a
spam.
3) If the protocol of email is valid (PV), context length (CL) is
short and frequency (FRQ) is more, then the email is a spam.
4) If the protocol of email is valid (PV), context length (CL) is
short, frequency (FRQ) is less or seldom, receive time (RT) is
night, and receiver number (RN) is more, then the email is a
spam.
5) If the protocol of email is valid (PV), context length (CL) is
short, frequency (FRQ) is less or seldom, receive time (RT) is
night, receiver number (RN) is less, context type (CT) is
multipart, then the email is a spam.
6) If the protocol of email is valid (PV), context length (CL) is
short, frequency (FRQ) is less or seldom, receive time (RT) is
night, receiver number (RN) is less, context type (CT) is
multipart, attachment number (AN) is less or more, then the
email is a spam.
7) If the sender's mail server is not valid and reliable, then the
email is a spam.
First, spam measures are determined which contain two
implicit and tacit parts. Implicit measures are analyzed by
Mamdani's fuzzy decision tree, such as protocol type, context
length, context type, time, frequency, receiver number,
attachment number and etc. Tacit measures are analyzed by
Naïve Bayesian method such as frequency of free word
repetition, money, three zeros in a row and etc. In fact, the
considered data set is a combination of implicit characteristics
that are in fuzzy_ Mamdani decision tree and tacit
characteristics that are used in Naive Bayesian method.
Implicit characteristics of the considered data set are analyzed
by decision tree algorithms (ID3) and the results are
completed by Fuzzy Mamdani rules [7].
Then tacit characteristics are examined in Naive Bayesian
principles and finally, the obtained results from both
algorithms are entered in Baking algorithm, that is each mail
in a dataset enters the Naive Bayesian and decision tree and in
the absence of correct diagnosis (FP and NP) a negative score
is registered for the procedure. Finally, the optimal weight
may be achieved through trial and error. The bonus rate
should also be achieved. This means that the desired class
level of the case (or a spam) is divided by the number of spam
detection methods. And the result should be divided by the
number of mails of the dataset to obtain bonus rate. Mails that
are classified correctly are multiplied by bonus rate, and mails
that are classified incorrectly are multiplied by bonus rate too.
The obtained difference by multiplying the bonus rate in
wrong and correct classifying is collected with initial weight
(0.5%) This operation is done for Naive Bayesian and
decision tree methods and because Naive Bayesian method's
threshold is more favorable, it's considered as final threshold.
To obtain the ultimate accuracy, each mail is entered in to two
Naive Bayesian and decision tree [8].
The output of methods, if both methods have the same results,
or in the case of difference, the priority of identification is
given to Naive Bayesian method. And to obtain the ultimate
accuracy, results are compared with the main class of the mail
(spam or ham). When a new mail enters, after the recognition
of both methods (Ham=0, spam=1) the output of each method
is multiplied by the coefficient obtained from that method,
and obtained values are gathered together, for example if just
the tree realizes the spam and the other one doesn't realize it,
the accuracy is in average and if the response of both methods
are the same, for example both detect spam or both do not
detect spam, the accuracy is desirable. In the final test by K-
Fold method, the data set is divided in to four parts. The first
part is for testing and the rest are for learning, in the next step
the second part is for testing and the first, third, and forth parts
are for learning, then the third part is for testing and the other
parts are for learning and after that the forth part is for testing
and other parts are for learning [9].
4. RESULT AND DISCUSSION
The dataset that the proposed method is implemented on it
contains 1000 emails that 350 (35%) of them are spam and
650 (65%) of them are ham. The last column of this data set is
class column and number 1 means spam and 0 means ham.
Some examples of keywords for no implicit part of
implementation on Naive Bayesian are as follow:
Money, Credit, 000, Internet, Edu, Talent, Free, Make ,# ,$ ,
...
And the other part of this dataset contains implicit
characteristics to use for the implementation on fuzzy decision
tree, such as:
Sending time, Context type, Context length, Frequency,
Receiver number, Sender's number, …
The goal of testing the mentioned dataset is to examine the
accuracy of detection of the proposed method and showing a
better detection of spams rather than efficiency of Naive
Bayesian or decision tree methods. The method is that after
analyzing dataset in Naive Bayesian method and extracting
levels of efficiency, accuracy and dark bright points and areas,
the same data set is analyzed by decision tree and levels of
efficiency, accuracy and dark, bright points and areas are
extracted, then the obtained results are voted based on Baking
method, then the method that has got better comprehension is
a priority and its further recognition is collected with the
interface of the two methods. To implement in Naive
Bayesian method, first the considered data set is implemented
in Weka software, then the considered inputs are chosen
among fields of dataset, The data set that the proposed method
is implemented on it contains 1000 emails that 350 (35%) of
them are spam and 650 (65%) of them are ham. The last
4. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 61 - 65, 2016, ISSN:- 2319–8656
www.ijcat.com 64
column of this dataset is class column and number 1 means
spam and 0 means ham. Some examples of keywords for no
implicit part of implementation on Naive Bayesian are as
follow:
Money, Credit, 000, Internet, Edu, Talent, Free, Make ,# ,$ ,
...
And the other part of this dataset contains implicit
characteristics to use for the implementation on fuzzy decision
tree, such as:
Sending time, Context type, Context length, Frequency,
Receiver number, Sender's number... ,
The goal of testing the mentioned dataset is to examine the
accuracy of detection of the proposed method and showing a
better detection of spams rather than efficiency of Naive
Bayesian or decision tree methods. The method is that after
analyzing dataset in Naive Bayesian method and extracting
levels of efficiency, accuracy and dark, bright points and
areas, the same data set is analyzed by decision tree and levels
of efficiency, accuracy and dark, bright points and areas are
extracted, then the obtained results are voted based on Baking
method, then the method that has got better comprehension is
a priority and its further recognition is collected with the
interface of the two methods. To implement in Naive
Bayesian method, first the considered data set is implemented
in Weka software, then the considered inputs are chosen
among fields of dataset [10].
To show the efficiency, the proposed method is discussed
with one of these methods. A comparison is done based on
accuracy and measurement criteria so that the examined
dataset is divided in to ten sections and is examined in groups
of 100,200,300,....,1000 mails. The obtained results are
compared with the results of spam particle swarm
optimization method which contains negative selection
method and particle swarm optimization method [11].
Figure 2 Precision Compare Between Methods
Figure 3 F-Measure
5. CONCLUSION
This method presents a new solution to detect spams by the
use of fuzzy decision tree, Naive Bayesian, and Baking voting
algorithm to extract spam's behavioral patterns. Because
completely clear characteristics don't exist in real world, the
degree of crosslinking to explain characters are neutral and
rational. Fuzzy decision tree detects spam and ham mails by
the use of fuzzy Mamdani rules, then Naive Bayesian method
by the use of Bayesian formula does the same operation on
chosen dataset, then Baking method by dividing votes in to
smaller sections, gaining optimized weight and implementing
it on obtained percentages will achieve the level of accuracy
and health[12]. The proposed method not only shows a better
efficiency in comparison with using each method separately,
but also by the use of common interface of spam and ham
emails detection (common TPs and TNs of both methods)
divides detection in to two categories of reliable and highly
reliable. One of the most important items in determining the
optimal method of spam detection is minimizing the number
of ham mails that are detected as spam mails because finding
and deleting a spam among ham mails is easy for the users
while finding a ham mail among spam ones is typically
difficult and time consuming. To improve accuracy of spam
detection results, two methods are used and by the use of
Baking voting method and dividing votes, a better spam
detection is provided. As mentioned in previous chapter, the
comparison of suggested method with some methods that
have been done before, shows better performance in terms of
obtained accuracy results. Adding a preprocessing fuzzy level
to process contents of emails for users by the use of
categorizing mails based on content, subject, sender, time,
receiver's number, sender's number, and etc. and combining
three Naive Bayesian, decision tree, and Baking algorithm
methods based on tacit and implicit components of a mail,
categorizing has been done based on two methods and voting
has been done by Baking algorithm, and false positive and
negative rates cause an improvement in the accuracy of
statistical filters to detect spams and a decrease in error
detection [13].
6. RECOMMENDATIONS AND
FUTURE WORK
To improve the proposed method, we can expand branches
and leaves of decision tree to enter more details. In fact
detailed fuzzy making of a mail includes: sending time,
5. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 61 - 65, 2016, ISSN:- 2319–8656
www.ijcat.com 65
sending protocol, context length, context type, time zone,
number of receivers, frequency, and number of attachments,
which increase accuracy performance of decision tree in
detecting spams.
Operations such as adding more characteristics to fuzzy
Mamdani decision tree and increasing Mamdani's laws
improve the efficiency. Adding no implicit details to different
parts of a letter such as subject, content, sender, effective
keywords in Naive Bayesian method cause the performance
improvement of Naive Bayesian method in the field of
classifying letters. Finally, the use of both methods in baking
algorithm show a better performance percentage. The more
the K-Fold divider, the higher the detection accuracy of
proposed method is. In other words, the amount of considered
K-Fold in proposed algorithm correlates with the accuracy of
diagnosis. More attention to details of spam detection and
correct classification of mails, results in the increase of
accuracy. On the other hand, detection and division of implicit
and no implicit characteristics of a mail that each one is
detected in its own related method, help a better classification
of emails. Note that more attention to details of a mail in
detection of a spam will increase accuracy and decrease
simplicity and understanding of the method.
7. REFERENCES
[1]. Wu, C.T., Cheng, K.T., Zhu, Q., and Wu, Y.L., 2008,
“Using Visual Features For Anti-Spam Filtering”, In
Proceedings of the IEEE International Conference on Image
Processing, Vol. 29, Iss. 1, pp. 63-92.
[2]. Goodman, J., and Rounthwaite, R., 2004, “Stopping
Outgoing Spam”, In Proceedings of the 5th ACM Conference
on Electronic Commerce, pp. 30-39.
[3]. Siponen, M., and Stucke, C., 2006, “Effective Antispam
Strategies In Companies: An International Study”, In
Proceedings of the 39th IEEE Annual Hawaii International
Conference on Transaction on Spam Detection, Vol. 6, pp.
245-252.
[4]. Cody, S., Cukier, W., and Nesselroth, E., 2006, “Genres
Of Spam: Expectations And Deceptions”, In Proceedings of
the 39th Annual Hawaii International Conference on System
Sciences, Vol. 3, pp. 48-51.
[5]. Golbeck, J., and Hendler, J., 2006, “Reputation Network
Analysis For Email Filtering”, In Proceedings of the First
International Conference on Email and Anti-Spam, pp. 21-23.
[6]. Liang, Z., Jianmin, G., and Jian, H., 2012, “The Research
and Design of an Anti-open Junk Mail Relay System”, In
Proceedings of the First IEEE International Conference on
Computer Science and Service System, pp. 1258-1262.
[7]. Feamster, N., and Ramachandran, A., 2006,
“Understanding The Network-Level Behavior Of Spammers”,
In Proceeding of the 3th ACM Conference on Email and Anti-
Spam, Vol. 36, Iss. 4, pp. 291-302.
[8]. Lili, D., and Yun, W., 2011, “Research And Design Of
ID3 Algorithm Rules-Based Anti-Spam Email Filtering”, In
Proceedings of the Second IEEE International Conference on
Software Engineering and Service Science, pp. 572-575.
[9]. Zhitang, L., and Sheng, Z., 2009, “A Method for Spam
Behavior Recognition Based on Fuzzy Decision Tree”, In
Proceedings of the Ninth IEEE International Conference on
Computer and Information Technology , Vol. 2, pp. 236-241.
[10]. Duquenoy, P., Moustakas, E., and Ranganathan, E.,
2005, “Combating Spam Through Legislation: A Comparative
Analysis Of Us And European Approaches”, In Proceedings
of the Second International Conference on Email and Anti-
Spam,pp. 15-22.
[11]. Jones, L., 2007, “Good Times Virus Hoax FAQ”,
Available: http://cityscope.net/hoax1.html, [Accesed: Jul. 10,
2015].
[12]. Singhal, A., 2007, “An Overview Of Data Warehouse,
Olap And Data Mining Technology”, Springer Science
Business Media, LLC, Vol. 31, pp. 19-23.
[13]. Ismaila, I., and Selamat, A., 2014, “Improved Email
Spam Detection Model With Negative Selection Algorithm
And Particle Swarm Optimization”, Elsevier Journal of
Alliance and Faculty of Computing, Vol. 22, pp. 15-27.