SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1300
STUDY OF VARIOUS TECHNIQUES TO FILTER SPAM EMAILS
Jagdeep Kaur1, Priyanka2
1, 2 Computer Science and Engineering Swami Vivekanand Institute of Engineering and Technology
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Email is the most proficient andquickestmethod
of correspondence to trade data over the web. Because of the
expansion in the quantity of record holders over the different
social locales, there is a colossal increment in the rate of
spreading of spam messages. In spite of having different
apparatuses accessible still, there are many hotspots for the
spam to start. Electronic spam is the most troublesome
Internet wonder testing huge worldwide organizations,
counting AOL, Google, Yahoo and Microsoft. Spam causes
different issues that may, thus, cause financial misfortunes.
Spam causes activity issues and bottlenecks that point of
confinement memory space, registering power and speed.
Spam makes clients invest energy expelling it. Different
techniques have been produced to channel spam, including
black list/white list, Bayesian classification algorithms,
keyword matching, header information processing,
investigation of spam-sending factors and investigation of
received mails. In this paper we have discussed about spam
emails and various filtering and data mining techniques to
filter spam emails.
Key Words: Spam, Bayesian classification algorithm,
keyword matching, header information processing,
Black List, White List
1. INTRODUCTION
Internet has become an important and essential part of
human life. The increase in the utilization of internet has
increased the number of account holders over varioussocial
sites. Email is the simplest and fastest mode of
communication over the internet that is used both
personally and professionally. Due to the increase in the
number of account holders and an increase in the rate of
transmission of emails a serious issue of spam emails had
aroused. From a survey, it was analysed thatover294billion
emails are sent and received every day. Over90%emailsare
reported to be spam emails [1]. Emails are labelled into two
categories Spam emailsandHamemails.Spamemailsarethe
junk emails received from illegitimate users that might
contain advertisement, malicious code, Virus or to gain
personal profit from the user. Spam can be transmittedfrom
any source like Web, Text messages, Fax etc., depending
upon the mode of transmission spam canbecategorisedinto
various categories like email spam, web spam, text spam,
social networking spam [2]. The rate at which email
spamming is spreading is increasing tremendously because
of the fast and immodest way of sharing information. It was
reported that user receives more spam emails than ham
emails. Spam filtration is important because spam waste
time, energy, bandwidth, storage and consume other
resources [3].
Email can be categorised as a spam email if it shows
following characteristics [4]:
 Unsolicited Email: Email received from unknown
contact or illegitimate contact.
 Bulk Mailing: The type of email which is sent in bulk
to many users.
 Nameless Mails: The type of emails in which the
identity of the user is not shown or is hidden.
Spamming is a major issue and causes serious loss of
bandwidth and cost billon of dollars totheserviceproviders.
It is essential for distinguishing between the spam mail and
ham mail. Many algorithms are so far used to successfully
characterise the emails on their behaviourbut becauseofthe
changing technologies, hackers are becoming more
intelligent. So, better algorithms with high accuracy are
needed that successfully label an email as spam or ham
Email. Spam filter technique is used to label the email as a
junk and unwanted email and prevents it from entering the
authenticated account holder’s inbox.
2. FILTER TECHNIQUES
Filter techniques can be grouped into two categories [3]:
1. Machine Learning Based Technique: These
techniques are Support Vector Machine, Multi-Layer
Perceptron, Naïve BayesAlgorithm,DecisionTreeBased
etc.
2. Non-Machine Learning Based Technique: These
techniques are signature based, heuristic scanning,
blacklist/whitelist, sandboxing and mail header
scanning etc.
 Signatures: Signatures contains the information
taken from the documents. Signatures detect the
spam or threats by generating a unique value called a
hash value for each spam message. Signatures can be
generated in two ways firstly by fragmenting the
words into pairs and secondly by random generation
of numbers. Signature uses the hash value with the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1301
new email value to compare and to analyze if the
email is spam or ham.
 Blacklist and Whitelist: A blacklist is a list of
spammers or any illegitimate contact that tries to
send a spam or malicious email whilewhitelistisa list
that contains legitimate users or contacts that are
known to an individual account holder.
 Heuristic Scanning: This technique uses rules to
detect malicious contents and threats. Heuristic
scanning is a faster and efficient technique that
detects the spam or threats without executing the file
and works by understanding the behaviour.Heuristic
scanning allows the user to change the rules.
 Mail Header Checking: In this technique set of rules
are developed that are matchedwiththe email header
to detect if the email is spam or ham. If the header of
the email matches the rules, then itinvokestheserver
and directs the emails that contain empty field of
“From”, confliction in “To”, conflictionin“Subject” etc.
3. DATA MINING
Data mining, which is also defined as a knowledge discovery
process, means a process of extraction of unknown and
potentially useful information (such as rules of knowledge,
constraints, regulations) from data in databases [7]. Data
Mining follows three main steps in preparing the data for
processing, reducing the data to concise it and extracting
useful information. The major algorithms followed in data
mining are classified into six classes [5]. The following steps
are executed on raw data to obtain relevant information.
1. Anomaly Detection: It is the identification of data
records that are not desirable and might contain an
error in it, say temperature is 45, this indicates a bogus
data without units.
2. Association Rule Mining (ARM): It is a process of
identifying linkage among the items present in the
database. ARM induces the relationship between the
items, say bread and butter or bread and jam.
3. Clustering: A descriptive process that groups the data
of same structure in one cluster without using a pre-
defined structure say, a mail is a spam or ham mail.
Clustering will group the set of data into two clusters
based on the characteristics generated viz. a mail canbe
spam depending upon the type of content in the mail or
a mail can be ham mail. Such as K-Means and K-Medoid.
4. Classification: A predictive processthatgeneralizesthe
known structure to new data. Such as Support vector
machine, Multi-Layer Perceptron.
Summarization: A process of representing the data in the
compact form for visualization.
Various data mining techniques and systems areavailableto
mine the data dependingupontheknowledgeto beacquired,
depending upon the techniques and depending upon the
databases [1].
1. Based on techniques: Data mining techniques
comprises of query-driven mining, knowledge mining,
data-driven mining, statistical mining, pattern based
mining, text mining and interactive data mining.
2. Based on the database: Several databasesareavailable
that are used for mining the useful patterns, such as a
spatial database, multimedia database, relational
database, transactional database, and web database.
3. Based on knowledge: The knowledge discovery
process, include association rule mining, classification,
clustering, and regression. Knowledge can be grouped
into multilevel knowledge, primitive knowledge and
general level knowledge.
4. LITERATURE REVIEW
Tiago A. Almeida and Akebo Yamakami (2010) performed a
comparative analysis using content-basedfilteringforspam.
This paper discussed seven different modified versions of
Naïve Bayes Classifier and compared those results with
Linear Support Vector Machine on six different open and
large datasets. The results demonstrated that SVM, Boolean
NB and Basic NB are the best algorithms for spam detection.
However, SVM executed the accuracy rate higher than 90%
for almost all the datasets utilized [5].
Loredana Firte,Camelia LemnaruandRodica Potolea (2010)
performed a comparative analysis on spam detection filter
using KNN Algorithm and Resampling approach. This paper
makes use of the K-NN algorithm for classification of spam
emails on the predefined dataset using feature’s selected
from the content and emails properties. Resampling of the
datasets to appropriate set and positive distribution was
carried out to make the algorithm efficient for feature
selection [2].
Ms.D.Karthika Renuka, Dr. T. Hamsapriya, et. al. (2011)
performed a comparative analysis of spam classification
based on supervisedlearningusingseveral machinelearning
techniques. In this analysis, the comparison was done using
three different machine learning classification algorithms
viz. Naïve Bayes, J48 and Multilayer perceptron (MLP)
classifier. Results demonstrated high accuracy for MLP but
high time consumption. WhileNaïveBayesaccuracywaslow
than MLP but was fast enough in executionandlearning. The
accuracy of Naïve Bayes was enhanced using FBL feature
selection and used filtered Bayesian Learning with Naïve
Bayes. The modified Naïve Bayes showed the accuracy of
91% [6].
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1302
Rushdi Shams and Robert E. Mercer (2013) performed a
comparative analysis of the classification of spam emails by
using text and readability features. This paper proposed an
efficient spam classification method along with feature
selection using the content of emails and readability. This
paper used four datasets such as CSDMC2010, Spam
Assassin, Ling Spam, and Enron-spam. Features are
categorized into three categories i.e.traditional features,test
features and readability features. The proposed approach is
able to classify emails of any language because the features
are kept independent of the languages. This paper used five
classification based algorithms for spam detection viz.
Random Forest (RF), Bagging, Adaboostm 1, Support Vector
Machine (SVM) and Naïve Bayes (NB). Results comparison
among differentclassifierspredictedBaggingalgorithmtobe
the best for spam detection [7].
Anirudh Harisinghaney, Aman Dixit, Saurabh Gupta and
Anuja Arora (2014)performeda comparativeanalysisoftext
and images by using KNN,NaïveBayesandReverse-DBSCAN
Algorithm for email spam detection. This analysis paper
proposed a methodology for detecting text andspamemails.
They used Naïve Bayes, K-NN and a modified Reverse
DBSCAN (Density- Based Spatial Clustering of Application
with Noise) algorithm. Authors used Enron dataset for text
and image spam classification. They used Google’s open
source library, Tesseract for extracting words from images.
Results show that these three machine learning algorithms
give better results without pre-processing among which
Naïve Bayes algorithm is highly accurate than other
algorithms [8].
Savita Pundalik Teli and Santosh Kumar Biradar (2014)
performed an analysis of effective email classification for
spam and non-spam emails. In this paper, the author
compares three classification techniques such as KNN,
Support Vector Machine and Naïve Bayes. She shows that
Naïve Bayes gives maximum accuracy among other
algorithms that is 94.2%. The author then proposed a
method to enhance the efficiency of Naïve Bayes. The
proposed method is divided into three phases. In first phase
the user creates rule for classification, second phase trains
the classifier with training set by extracting the tokens, and
in third phase based on maximum token matches, the email
is classified as spam or ham.TheperformanceofNaïveBayes
is improved by this Algorithm. They take parameters
Precision, Recall, Accuracy [9].
Izzat Alsmadi and Ikdam Alhami (2015) performed an
analysis on clustering and classificationofemail contentsfor
the detection of spam. This paper collected a large dataset of
personal emails for the spam detection of emails based on
folder and subject classification. Supervised approach viz.
classification alongside unsupervised approach viz.
clustering was performed on the personal dataset. This
paper used SVM classification algorithm for classifying the
data obtainedfromK-means clusteringalgorithm.Thispaper
performed three typesofclassificationviz.withoutremoving
stop words, removing stop words and using N-gram based
classification. The results clearly illustrated that N-gram
based classification for spam detection is the best approach
for large and Bi-language text [10].
Ryan McConville, X. Cao, W. Liu, P. Millerv (2016) gives a
general framework to accelerate existing algorithms to
cluster large-scale datasets which contain large numbers of
attributes, items, and clusters is proposed. This framework
makes use of locality sensitive hashing to significantly
reduce the cluster search space. This framework has a
guaranteed error bound in terms of the clustering quality.
This framework can be applied to a set of centroid-based
clustering algorithms that assign an object to the most
similar cluster. K-Modes categorical clustering algorithm to
present how the framework can be applied is adopted. The
framework with five synthetic datasets and a real world
Yahoo! Answers dataset was used. The experimental results
demonstrate that this framework is able to speed up the
existing clustering algorithm between factors of 2 and 6,
while maintaining comparable cluster purity [11].
N. Akhtar and N. Agarwal (2014) introduces a new process
for segmenting a photograph. It combines two learning
algorithms, particularly the k-approach Clustering and
Neutrosophic good judgment, together to obtain effective
results through eliminating the uncertainty of the pixels.
Forming hard clusters by applying neutrosophy before k –
mean algorithm [12].
C. Jacob and K A Abdul Nazeer (2014) made use of k-means
algorithm along withimprovedclusteringprocessantcolony
algorithm. The algorithm works on the principle of
probability following pick anddrop. Thealgorithmiscapable
enough for determiningoptimal numberofclustersandtheir
corresponding centroid’s. It eliminates the problem of local
optimal hence making the algorithm least dependent on
initial centroid’s [13].
5. PROPOSED WORK
The existing methods have some limitations of having less
accuracy and precision. The main objective of this proposed
work is to upgrade the existing machine learningtechniques
in distinguishing spam emails.
Objectives of proposed work are as follows:
1. To study various spam detection algorithms for emails.
2. To propose an approach for email spam detection using
improved MLP with N-gram feature selection.
3. To compare and analyse the results of proposed
approach with the existing on the basis of parameters
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1303
viz. Accuracy, Sensitivity, Specificity, Root Mean Square
Error and Precision.
6. CONCLUSION
In this paper, we have discussed about spam emails and its
characteristics. Further we elaborate various filtering
techniques like machine learning based technique and non-
machine learning based technique in which we cover
signatures, balcklist and whitelist, heuristic scanning, mail
header checking. We discussed about data mining, its
procedure and its techniques. Literature survey discusses
about the previous work. Existing system has some
drawbacks related to accuracy and precision. So to enhance
the current machine learning techniques, we proposed the
work using MLP with N-gram feature selection.
REFERENCES
[1] D. Jurafsky and J. H. Martin, “N-Gram,” Speech and
Language Processing, 2014.
[2] L. Firte, C. Lemnaru, and R. Potolea,“SpamDetection
Filter using KNN Algorithm and Resampling”, in6th
International Conference on Intelligent Computer
Communication and Processing -IEEE, pp.27-33,
2010.
[3] Kaur, R. K. Gurm, “A Survey on Classification
Techniques in Internet Environment”,International
Journal of Advance Research in Computer and
Communication Engineering (IJARCCE),vol.5,no.3,
pp. 589–593, March 2016.
[4] H Kaur, P. Verma, “Survey on E-Mail SpamDetection
Using Supervised ApproachwithFeatureSelection,”
International Journal of Engineering Sciences and
Research Technology (IJESRT), vol.6,no.4,pp.120-
128, April 2017.
[5] B. Yu and Z. Xu,“A comparative study for content-
based dynamic spam classification using four
machine learning algorithms”, Knowledge Based
System-Elsevier, vol. 21, pp. 355–362, 2008.
[6] D. K. Renuka, T. Hamsapriya, M. R. Chakkaravarthi
and P. L. Surya, “Spam Classification Based on
Supervised Learning Using Machine Learning
Techniques”, in 2011 International Conference on
Process Automation, Control and Computing-IEEE,
pp. 1–7, 2011.
[7] R. Shams and R. E. Mercer, “Classifying spam emails
using text and readability features”, inInternational
Conference on Data Mining (ICDM) - IEEE, pp. 657–
666, 2013.
[8] A. Harisinghaney, A. Dixit, S.Gupta,andAnuja Arora,
“Text and image based spam email classification
using KNN, Naïve Bayes and Reverse DBSCAN
Algorithm”, in International Conference on
Reliability, Optimization and Information
Technology (ICROIT)-IEEE, pp.153-155, 2014.
[9] S. P. Teli and S. K. Biradar, “Effective Email
Classification for Spam and Non- spam”,
International Journal of Advanced Research in
Computer and software Engineering, vol. 4, 2014.
[10] Alsmadi and I. Alhami, “Clustering andclassification
of email contents”, Journal of King Saud University -
Computer and InformationScience-Elsevier,vol.27,
no. 1, pp. 46–57, 2015.
[11] R. M. Conville, X. Cao, W. Liu, P. Millerv,
“Accelerating Large Scale Centroid-Based
Clustering with Locality Sensitive Hashing”, in-
International Conference on Data Engineering-
IEEE, pp. 649-660, 2016.
[12] N. Akhtar, N. Agarwal , ”K-Mean Algorithm For
Image Segmentation Using Neutrosophy”,in-
International Conference on Advances in
Computing Communications and Informatics-
IEEE, pp. 2417-2421, 2014.
[13] C. Jacob, K A Abdul Nazeer, “An Improved ICPACA
based K- Means Algorithm with Self Determined
Centroids, ”in- International Conference on Data
Science and Engineering- IEEE , pp. 89-93 , 2014.

More Related Content

What's hot

IRJET- Multimedia Content Security with Random Key Generation Approach in...
IRJET-  	  Multimedia Content Security with Random Key Generation Approach in...IRJET-  	  Multimedia Content Security with Random Key Generation Approach in...
IRJET- Multimedia Content Security with Random Key Generation Approach in...
IRJET Journal
 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET Journal
 
IRJET- Suspicious Mail Detection
IRJET-  	  Suspicious Mail DetectionIRJET-  	  Suspicious Mail Detection
IRJET- Suspicious Mail Detection
IRJET Journal
 
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
IJNSA Journal
 
Cloud Data Security using Elliptic Curve Cryptography
Cloud Data Security using Elliptic Curve CryptographyCloud Data Security using Elliptic Curve Cryptography
Cloud Data Security using Elliptic Curve Cryptography
IRJET Journal
 
IRJET - DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
IRJET -  	  DOD Data Hiding Technique using Advanced LSB with AES-256 AlgorithmIRJET -  	  DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
IRJET - DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
IRJET Journal
 
Identifying Malicious Data in Social Media
Identifying Malicious Data in Social MediaIdentifying Malicious Data in Social Media
Identifying Malicious Data in Social Media
IRJET Journal
 
IRJET- I-Share: A Secure Way to Share Images
IRJET- I-Share: A Secure Way to Share ImagesIRJET- I-Share: A Secure Way to Share Images
IRJET- I-Share: A Secure Way to Share Images
IRJET Journal
 
Design of Hybrid Cryptography Algorithm for Secure Communication
Design of Hybrid Cryptography Algorithm for Secure CommunicationDesign of Hybrid Cryptography Algorithm for Secure Communication
Design of Hybrid Cryptography Algorithm for Secure Communication
IRJET Journal
 
IRJET- Survey of Cryptographic Techniques to Certify Sharing of Informati...
IRJET-  	  Survey of Cryptographic Techniques to Certify Sharing of Informati...IRJET-  	  Survey of Cryptographic Techniques to Certify Sharing of Informati...
IRJET- Survey of Cryptographic Techniques to Certify Sharing of Informati...
IRJET Journal
 
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
IRJET Journal
 
Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...
TELKOMNIKA JOURNAL
 
Location Based Encryption-Decryption Approach for Data Security
Location Based Encryption-Decryption Approach for Data SecurityLocation Based Encryption-Decryption Approach for Data Security
Location Based Encryption-Decryption Approach for Data Security
Editor IJCATR
 
A survey on evil twin detection methods for wireless local area network
A survey on evil twin detection methods for wireless  local area networkA survey on evil twin detection methods for wireless  local area network
A survey on evil twin detection methods for wireless local area networkIAEME Publication
 
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
Tương Hoàng
 
EPLQ:Efficient privacy preserving spatial range query for smart phones
EPLQ:Efficient privacy preserving spatial range query for smart phonesEPLQ:Efficient privacy preserving spatial range query for smart phones
EPLQ:Efficient privacy preserving spatial range query for smart phones
IRJET Journal
 
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative StudyAsymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
IRJET Journal
 
IRJET- A Survey on Cryptography, Encryption and Compression Techniques
IRJET- A Survey on Cryptography, Encryption and Compression TechniquesIRJET- A Survey on Cryptography, Encryption and Compression Techniques
IRJET- A Survey on Cryptography, Encryption and Compression Techniques
IRJET Journal
 
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
IRJET Journal
 
A Review Study on Secure Authentication in Mobile System
A Review Study on Secure Authentication in Mobile SystemA Review Study on Secure Authentication in Mobile System
A Review Study on Secure Authentication in Mobile SystemEditor IJCATR
 

What's hot (20)

IRJET- Multimedia Content Security with Random Key Generation Approach in...
IRJET-  	  Multimedia Content Security with Random Key Generation Approach in...IRJET-  	  Multimedia Content Security with Random Key Generation Approach in...
IRJET- Multimedia Content Security with Random Key Generation Approach in...
 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...
 
IRJET- Suspicious Mail Detection
IRJET-  	  Suspicious Mail DetectionIRJET-  	  Suspicious Mail Detection
IRJET- Suspicious Mail Detection
 
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
A LIGHT WEIGHT PROTOCOL TO PROVIDE LOCATION PRIVACY IN WIRELESS BODY AREA NET...
 
Cloud Data Security using Elliptic Curve Cryptography
Cloud Data Security using Elliptic Curve CryptographyCloud Data Security using Elliptic Curve Cryptography
Cloud Data Security using Elliptic Curve Cryptography
 
IRJET - DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
IRJET -  	  DOD Data Hiding Technique using Advanced LSB with AES-256 AlgorithmIRJET -  	  DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
IRJET - DOD Data Hiding Technique using Advanced LSB with AES-256 Algorithm
 
Identifying Malicious Data in Social Media
Identifying Malicious Data in Social MediaIdentifying Malicious Data in Social Media
Identifying Malicious Data in Social Media
 
IRJET- I-Share: A Secure Way to Share Images
IRJET- I-Share: A Secure Way to Share ImagesIRJET- I-Share: A Secure Way to Share Images
IRJET- I-Share: A Secure Way to Share Images
 
Design of Hybrid Cryptography Algorithm for Secure Communication
Design of Hybrid Cryptography Algorithm for Secure CommunicationDesign of Hybrid Cryptography Algorithm for Secure Communication
Design of Hybrid Cryptography Algorithm for Secure Communication
 
IRJET- Survey of Cryptographic Techniques to Certify Sharing of Informati...
IRJET-  	  Survey of Cryptographic Techniques to Certify Sharing of Informati...IRJET-  	  Survey of Cryptographic Techniques to Certify Sharing of Informati...
IRJET- Survey of Cryptographic Techniques to Certify Sharing of Informati...
 
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
Securing Messages from Brute Force Attack by Combined Approach of Honey Encry...
 
Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...
 
Location Based Encryption-Decryption Approach for Data Security
Location Based Encryption-Decryption Approach for Data SecurityLocation Based Encryption-Decryption Approach for Data Security
Location Based Encryption-Decryption Approach for Data Security
 
A survey on evil twin detection methods for wireless local area network
A survey on evil twin detection methods for wireless  local area networkA survey on evil twin detection methods for wireless  local area network
A survey on evil twin detection methods for wireless local area network
 
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
Analysis of network_security_threats_and_vulnerabilities_by_development__impl...
 
EPLQ:Efficient privacy preserving spatial range query for smart phones
EPLQ:Efficient privacy preserving spatial range query for smart phonesEPLQ:Efficient privacy preserving spatial range query for smart phones
EPLQ:Efficient privacy preserving spatial range query for smart phones
 
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative StudyAsymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
Asymmetrical Encryption for Wireless Sensor Networks: A Comparative Study
 
IRJET- A Survey on Cryptography, Encryption and Compression Techniques
IRJET- A Survey on Cryptography, Encryption and Compression TechniquesIRJET- A Survey on Cryptography, Encryption and Compression Techniques
IRJET- A Survey on Cryptography, Encryption and Compression Techniques
 
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
IRJET- Implementation of DNA Cryptography in Cloud Computing and using Socket...
 
A Review Study on Secure Authentication in Mobile System
A Review Study on Secure Authentication in Mobile SystemA Review Study on Secure Authentication in Mobile System
A Review Study on Secure Authentication in Mobile System
 

Similar to Study of Various Techniques to Filter Spam Emails

An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
IRJET Journal
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
IRJET Journal
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
IRJET Journal
 
Overview of Anti-spam filtering Techniques
Overview of Anti-spam filtering TechniquesOverview of Anti-spam filtering Techniques
Overview of Anti-spam filtering Techniques
IRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
IRJET Journal
 
A Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data ClusteringA Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data Clustering
idescitation
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
IJNSA Journal
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithm
IRJET Journal
 
IRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & AutomationIRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & Automation
IRJET Journal
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
Editor IJCATR
 
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET Journal
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
Editor IJCATR
 
A Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVMA Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVM
IRJET Journal
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection System
IRJET Journal
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
IJNSA Journal
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
eSAT Publishing House
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
eSAT Publishing House
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
eSAT Journals
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
Heather Strinden
 

Similar to Study of Various Techniques to Filter Spam Emails (20)

An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
Overview of Anti-spam filtering Techniques
Overview of Anti-spam filtering TechniquesOverview of Anti-spam filtering Techniques
Overview of Anti-spam filtering Techniques
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
A Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data ClusteringA Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data Clustering
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithm
 
IRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & AutomationIRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & Automation
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
 
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
A Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVMA Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVM
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection System
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 

Study of Various Techniques to Filter Spam Emails

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1300 STUDY OF VARIOUS TECHNIQUES TO FILTER SPAM EMAILS Jagdeep Kaur1, Priyanka2 1, 2 Computer Science and Engineering Swami Vivekanand Institute of Engineering and Technology ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Email is the most proficient andquickestmethod of correspondence to trade data over the web. Because of the expansion in the quantity of record holders over the different social locales, there is a colossal increment in the rate of spreading of spam messages. In spite of having different apparatuses accessible still, there are many hotspots for the spam to start. Electronic spam is the most troublesome Internet wonder testing huge worldwide organizations, counting AOL, Google, Yahoo and Microsoft. Spam causes different issues that may, thus, cause financial misfortunes. Spam causes activity issues and bottlenecks that point of confinement memory space, registering power and speed. Spam makes clients invest energy expelling it. Different techniques have been produced to channel spam, including black list/white list, Bayesian classification algorithms, keyword matching, header information processing, investigation of spam-sending factors and investigation of received mails. In this paper we have discussed about spam emails and various filtering and data mining techniques to filter spam emails. Key Words: Spam, Bayesian classification algorithm, keyword matching, header information processing, Black List, White List 1. INTRODUCTION Internet has become an important and essential part of human life. The increase in the utilization of internet has increased the number of account holders over varioussocial sites. Email is the simplest and fastest mode of communication over the internet that is used both personally and professionally. Due to the increase in the number of account holders and an increase in the rate of transmission of emails a serious issue of spam emails had aroused. From a survey, it was analysed thatover294billion emails are sent and received every day. Over90%emailsare reported to be spam emails [1]. Emails are labelled into two categories Spam emailsandHamemails.Spamemailsarethe junk emails received from illegitimate users that might contain advertisement, malicious code, Virus or to gain personal profit from the user. Spam can be transmittedfrom any source like Web, Text messages, Fax etc., depending upon the mode of transmission spam canbecategorisedinto various categories like email spam, web spam, text spam, social networking spam [2]. The rate at which email spamming is spreading is increasing tremendously because of the fast and immodest way of sharing information. It was reported that user receives more spam emails than ham emails. Spam filtration is important because spam waste time, energy, bandwidth, storage and consume other resources [3]. Email can be categorised as a spam email if it shows following characteristics [4]:  Unsolicited Email: Email received from unknown contact or illegitimate contact.  Bulk Mailing: The type of email which is sent in bulk to many users.  Nameless Mails: The type of emails in which the identity of the user is not shown or is hidden. Spamming is a major issue and causes serious loss of bandwidth and cost billon of dollars totheserviceproviders. It is essential for distinguishing between the spam mail and ham mail. Many algorithms are so far used to successfully characterise the emails on their behaviourbut becauseofthe changing technologies, hackers are becoming more intelligent. So, better algorithms with high accuracy are needed that successfully label an email as spam or ham Email. Spam filter technique is used to label the email as a junk and unwanted email and prevents it from entering the authenticated account holder’s inbox. 2. FILTER TECHNIQUES Filter techniques can be grouped into two categories [3]: 1. Machine Learning Based Technique: These techniques are Support Vector Machine, Multi-Layer Perceptron, Naïve BayesAlgorithm,DecisionTreeBased etc. 2. Non-Machine Learning Based Technique: These techniques are signature based, heuristic scanning, blacklist/whitelist, sandboxing and mail header scanning etc.  Signatures: Signatures contains the information taken from the documents. Signatures detect the spam or threats by generating a unique value called a hash value for each spam message. Signatures can be generated in two ways firstly by fragmenting the words into pairs and secondly by random generation of numbers. Signature uses the hash value with the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1301 new email value to compare and to analyze if the email is spam or ham.  Blacklist and Whitelist: A blacklist is a list of spammers or any illegitimate contact that tries to send a spam or malicious email whilewhitelistisa list that contains legitimate users or contacts that are known to an individual account holder.  Heuristic Scanning: This technique uses rules to detect malicious contents and threats. Heuristic scanning is a faster and efficient technique that detects the spam or threats without executing the file and works by understanding the behaviour.Heuristic scanning allows the user to change the rules.  Mail Header Checking: In this technique set of rules are developed that are matchedwiththe email header to detect if the email is spam or ham. If the header of the email matches the rules, then itinvokestheserver and directs the emails that contain empty field of “From”, confliction in “To”, conflictionin“Subject” etc. 3. DATA MINING Data mining, which is also defined as a knowledge discovery process, means a process of extraction of unknown and potentially useful information (such as rules of knowledge, constraints, regulations) from data in databases [7]. Data Mining follows three main steps in preparing the data for processing, reducing the data to concise it and extracting useful information. The major algorithms followed in data mining are classified into six classes [5]. The following steps are executed on raw data to obtain relevant information. 1. Anomaly Detection: It is the identification of data records that are not desirable and might contain an error in it, say temperature is 45, this indicates a bogus data without units. 2. Association Rule Mining (ARM): It is a process of identifying linkage among the items present in the database. ARM induces the relationship between the items, say bread and butter or bread and jam. 3. Clustering: A descriptive process that groups the data of same structure in one cluster without using a pre- defined structure say, a mail is a spam or ham mail. Clustering will group the set of data into two clusters based on the characteristics generated viz. a mail canbe spam depending upon the type of content in the mail or a mail can be ham mail. Such as K-Means and K-Medoid. 4. Classification: A predictive processthatgeneralizesthe known structure to new data. Such as Support vector machine, Multi-Layer Perceptron. Summarization: A process of representing the data in the compact form for visualization. Various data mining techniques and systems areavailableto mine the data dependingupontheknowledgeto beacquired, depending upon the techniques and depending upon the databases [1]. 1. Based on techniques: Data mining techniques comprises of query-driven mining, knowledge mining, data-driven mining, statistical mining, pattern based mining, text mining and interactive data mining. 2. Based on the database: Several databasesareavailable that are used for mining the useful patterns, such as a spatial database, multimedia database, relational database, transactional database, and web database. 3. Based on knowledge: The knowledge discovery process, include association rule mining, classification, clustering, and regression. Knowledge can be grouped into multilevel knowledge, primitive knowledge and general level knowledge. 4. LITERATURE REVIEW Tiago A. Almeida and Akebo Yamakami (2010) performed a comparative analysis using content-basedfilteringforspam. This paper discussed seven different modified versions of Naïve Bayes Classifier and compared those results with Linear Support Vector Machine on six different open and large datasets. The results demonstrated that SVM, Boolean NB and Basic NB are the best algorithms for spam detection. However, SVM executed the accuracy rate higher than 90% for almost all the datasets utilized [5]. Loredana Firte,Camelia LemnaruandRodica Potolea (2010) performed a comparative analysis on spam detection filter using KNN Algorithm and Resampling approach. This paper makes use of the K-NN algorithm for classification of spam emails on the predefined dataset using feature’s selected from the content and emails properties. Resampling of the datasets to appropriate set and positive distribution was carried out to make the algorithm efficient for feature selection [2]. Ms.D.Karthika Renuka, Dr. T. Hamsapriya, et. al. (2011) performed a comparative analysis of spam classification based on supervisedlearningusingseveral machinelearning techniques. In this analysis, the comparison was done using three different machine learning classification algorithms viz. Naïve Bayes, J48 and Multilayer perceptron (MLP) classifier. Results demonstrated high accuracy for MLP but high time consumption. WhileNaïveBayesaccuracywaslow than MLP but was fast enough in executionandlearning. The accuracy of Naïve Bayes was enhanced using FBL feature selection and used filtered Bayesian Learning with Naïve Bayes. The modified Naïve Bayes showed the accuracy of 91% [6].
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1302 Rushdi Shams and Robert E. Mercer (2013) performed a comparative analysis of the classification of spam emails by using text and readability features. This paper proposed an efficient spam classification method along with feature selection using the content of emails and readability. This paper used four datasets such as CSDMC2010, Spam Assassin, Ling Spam, and Enron-spam. Features are categorized into three categories i.e.traditional features,test features and readability features. The proposed approach is able to classify emails of any language because the features are kept independent of the languages. This paper used five classification based algorithms for spam detection viz. Random Forest (RF), Bagging, Adaboostm 1, Support Vector Machine (SVM) and Naïve Bayes (NB). Results comparison among differentclassifierspredictedBaggingalgorithmtobe the best for spam detection [7]. Anirudh Harisinghaney, Aman Dixit, Saurabh Gupta and Anuja Arora (2014)performeda comparativeanalysisoftext and images by using KNN,NaïveBayesandReverse-DBSCAN Algorithm for email spam detection. This analysis paper proposed a methodology for detecting text andspamemails. They used Naïve Bayes, K-NN and a modified Reverse DBSCAN (Density- Based Spatial Clustering of Application with Noise) algorithm. Authors used Enron dataset for text and image spam classification. They used Google’s open source library, Tesseract for extracting words from images. Results show that these three machine learning algorithms give better results without pre-processing among which Naïve Bayes algorithm is highly accurate than other algorithms [8]. Savita Pundalik Teli and Santosh Kumar Biradar (2014) performed an analysis of effective email classification for spam and non-spam emails. In this paper, the author compares three classification techniques such as KNN, Support Vector Machine and Naïve Bayes. She shows that Naïve Bayes gives maximum accuracy among other algorithms that is 94.2%. The author then proposed a method to enhance the efficiency of Naïve Bayes. The proposed method is divided into three phases. In first phase the user creates rule for classification, second phase trains the classifier with training set by extracting the tokens, and in third phase based on maximum token matches, the email is classified as spam or ham.TheperformanceofNaïveBayes is improved by this Algorithm. They take parameters Precision, Recall, Accuracy [9]. Izzat Alsmadi and Ikdam Alhami (2015) performed an analysis on clustering and classificationofemail contentsfor the detection of spam. This paper collected a large dataset of personal emails for the spam detection of emails based on folder and subject classification. Supervised approach viz. classification alongside unsupervised approach viz. clustering was performed on the personal dataset. This paper used SVM classification algorithm for classifying the data obtainedfromK-means clusteringalgorithm.Thispaper performed three typesofclassificationviz.withoutremoving stop words, removing stop words and using N-gram based classification. The results clearly illustrated that N-gram based classification for spam detection is the best approach for large and Bi-language text [10]. Ryan McConville, X. Cao, W. Liu, P. Millerv (2016) gives a general framework to accelerate existing algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters is proposed. This framework makes use of locality sensitive hashing to significantly reduce the cluster search space. This framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster. K-Modes categorical clustering algorithm to present how the framework can be applied is adopted. The framework with five synthetic datasets and a real world Yahoo! Answers dataset was used. The experimental results demonstrate that this framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity [11]. N. Akhtar and N. Agarwal (2014) introduces a new process for segmenting a photograph. It combines two learning algorithms, particularly the k-approach Clustering and Neutrosophic good judgment, together to obtain effective results through eliminating the uncertainty of the pixels. Forming hard clusters by applying neutrosophy before k – mean algorithm [12]. C. Jacob and K A Abdul Nazeer (2014) made use of k-means algorithm along withimprovedclusteringprocessantcolony algorithm. The algorithm works on the principle of probability following pick anddrop. Thealgorithmiscapable enough for determiningoptimal numberofclustersandtheir corresponding centroid’s. It eliminates the problem of local optimal hence making the algorithm least dependent on initial centroid’s [13]. 5. PROPOSED WORK The existing methods have some limitations of having less accuracy and precision. The main objective of this proposed work is to upgrade the existing machine learningtechniques in distinguishing spam emails. Objectives of proposed work are as follows: 1. To study various spam detection algorithms for emails. 2. To propose an approach for email spam detection using improved MLP with N-gram feature selection. 3. To compare and analyse the results of proposed approach with the existing on the basis of parameters
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1303 viz. Accuracy, Sensitivity, Specificity, Root Mean Square Error and Precision. 6. CONCLUSION In this paper, we have discussed about spam emails and its characteristics. Further we elaborate various filtering techniques like machine learning based technique and non- machine learning based technique in which we cover signatures, balcklist and whitelist, heuristic scanning, mail header checking. We discussed about data mining, its procedure and its techniques. Literature survey discusses about the previous work. Existing system has some drawbacks related to accuracy and precision. So to enhance the current machine learning techniques, we proposed the work using MLP with N-gram feature selection. REFERENCES [1] D. Jurafsky and J. H. Martin, “N-Gram,” Speech and Language Processing, 2014. [2] L. Firte, C. Lemnaru, and R. Potolea,“SpamDetection Filter using KNN Algorithm and Resampling”, in6th International Conference on Intelligent Computer Communication and Processing -IEEE, pp.27-33, 2010. [3] Kaur, R. K. Gurm, “A Survey on Classification Techniques in Internet Environment”,International Journal of Advance Research in Computer and Communication Engineering (IJARCCE),vol.5,no.3, pp. 589–593, March 2016. [4] H Kaur, P. Verma, “Survey on E-Mail SpamDetection Using Supervised ApproachwithFeatureSelection,” International Journal of Engineering Sciences and Research Technology (IJESRT), vol.6,no.4,pp.120- 128, April 2017. [5] B. Yu and Z. Xu,“A comparative study for content- based dynamic spam classification using four machine learning algorithms”, Knowledge Based System-Elsevier, vol. 21, pp. 355–362, 2008. [6] D. K. Renuka, T. Hamsapriya, M. R. Chakkaravarthi and P. L. Surya, “Spam Classification Based on Supervised Learning Using Machine Learning Techniques”, in 2011 International Conference on Process Automation, Control and Computing-IEEE, pp. 1–7, 2011. [7] R. Shams and R. E. Mercer, “Classifying spam emails using text and readability features”, inInternational Conference on Data Mining (ICDM) - IEEE, pp. 657– 666, 2013. [8] A. Harisinghaney, A. Dixit, S.Gupta,andAnuja Arora, “Text and image based spam email classification using KNN, Naïve Bayes and Reverse DBSCAN Algorithm”, in International Conference on Reliability, Optimization and Information Technology (ICROIT)-IEEE, pp.153-155, 2014. [9] S. P. Teli and S. K. Biradar, “Effective Email Classification for Spam and Non- spam”, International Journal of Advanced Research in Computer and software Engineering, vol. 4, 2014. [10] Alsmadi and I. Alhami, “Clustering andclassification of email contents”, Journal of King Saud University - Computer and InformationScience-Elsevier,vol.27, no. 1, pp. 46–57, 2015. [11] R. M. Conville, X. Cao, W. Liu, P. Millerv, “Accelerating Large Scale Centroid-Based Clustering with Locality Sensitive Hashing”, in- International Conference on Data Engineering- IEEE, pp. 649-660, 2016. [12] N. Akhtar, N. Agarwal , ”K-Mean Algorithm For Image Segmentation Using Neutrosophy”,in- International Conference on Advances in Computing Communications and Informatics- IEEE, pp. 2417-2421, 2014. [13] C. Jacob, K A Abdul Nazeer, “An Improved ICPACA based K- Means Algorithm with Self Determined Centroids, ”in- International Conference on Data Science and Engineering- IEEE , pp. 89-93 , 2014.