SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 490
A Survey on Spam Filtering Methods and Mapreduce with SVM
Dipika Somvanshi1, Prof. Kanchan Doke2
1 ME(Comp.) Student of Bharati Vidyapeeth College of Engineering, Kharghar, Navi Mumbai ,India
2Proffesor, Computer Dept., Bharati Vidyapeeth College of Engineering, Kharghar, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Spam isanyunwantedandharmful mail
send to massive recipients in bulk quantity. Spam can be
harmful as it may contain malware & links to phishing
websites. So Separation of spam from normal mails in
separate folder is essential. Techniquestoseparatespammails
are word based, content based, machine learning based and
hybrid. Machinelearningtechniquesaremostpopularbecause
of high accuracy and mathematical support. This paper
surveys different spam filtering techniques. SVMisthepopular
machine learning techniques in spam filtering because it can
handle data with large number of attributes.
SVM requires more time to train the data and for
training it can’t work with large a dataset, these drawbacks
can be minimized by introducing MapReduce framework for
SVM. MapReduce framework can work in parallel with input
dataset file chunks to train SVM for time reduction. Thispaper
aims at surveying of few such spam filtering techniques and
scope to introduce MapReduce with SVM.
Key Words: Spam Filtering; Machine Learning
Techniques; Naïve Bays; KNN; Decision Trees; SVM;
Mapreduce
1. INTRODUCTION
The email system is one of the most used communication
tools. Email is a quick means of communication because one
has not to wait for the response and it is straightforward
way to stay in touch with the all. One major threat to an
email system is spam e-mail. The spam e-mail is nothing but
the unwanted mail send in bulk quantitybyspammersgroup
for their advantage. Spammers are group of peopleintended
to spread malicious content, advertise content, Links to
phishing websites through email. Spam Emails causes
overloading of server bandwidth, storage, cost, time for
separation of spam emails from ham E-mails. According to
the SMX email security provider, the live spam percentageis
about 79.5%. The average size of spam is 16 KB[1]. So
classification of Emails in spam & ham is most important
issue.
For the separation of such spams from important mails,
spam filtering is important. Various spam filtering
techniques exist in literature survey. Spam filtering
techniques are classifiedasMachineLearning based,Content
based, List based, Hybrid. Amongst them Machine learning
techniques give more accurate results due to their
mathematical background. Machine learning techniques
works with data mining algorithms and gives more satisfied
results. For spam filtering filters are trained withalgorithms
for sample data set of emails & then tested fornewsample of
emails. Machine learning based spam classification
algorithms are SVM, Naïve Bays, KNN, Decision Trees, etc.
Amongst these, Naïve Bayesian classification and Support
Vector Machine are most used and appreciated by
researchers. Also, number of freeware and paid tools are
available for spam filtering, they also makes use of these
techniques. Support Vector Machines (SVM) can be applied
efficiently in spam filtering. SVM works with Kernel function
and gives most satisfied results in spam filtering.SVM works
best with small set of data input. But it’s performance
degrades with increase in size of dataset. It requires large
time to train filter. So this issue needs to be addressed.
MapReduce can be effectively used for training of large
dataset input. It can process large data within lesstime.So in
this proposed system we have used MapReduce withSVM to
classify large set of emails into spam and ham. MapReduce
with SVM gives more speedup than traditional SVM
algorithm.
2. LITERATURE REVIEW OF MACHINE LEARNING
TECHNIQUES
A. Clustering:
Clustering is used for separation of objects into
relative classes called clusters. It classifies object or
observations in such a way that objects in a group are more
similarto each other than tothose in other group.[2]
K- Nearest Neighbor:
It is one of the simplest machine learning algorithms.KNN
works with ‘characteristics vector’. The characteristic
vectors are measure of similarities among all messages. In
this algorithm new incoming email is classified on the basis
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 491
of distance of that mail from both classes. Distance can be
calculated by Euclidian distance. KNN works as follows. It
consists of two phases.
Training phase: Storing the feature vectors & class labelsof
training emails.
Classification Phase: k is a user-defined constant, and an
unlabeled email is classified by assigning the label which is
most frequent among the k trainingemail samplesnearestto
new incoming email.
B. Bayesian Classification:
Bayesian classification is based on Bays theorem.
Bayesian classifiers can predict class membership
probabilities.Membershipprobabilitiesarethemeasure of
probability that given sample belongs to a particularclass.
In case of spam classification, probability of words is
calculated for spam and ham emails. Probability that
particular word is occurred in spam emails & ham emails
is calculated. Depending on that a training dictionary is
generated. When new email is arrived, its words
probability is calculated & if spam words probability is
more than ham probability then it is classified as spam
email else ham.
Naive Bayes classifier calculates word probability as
follows:
----------Eq. No. (1)
Where,
Pr(S/W) = probability that a message is a spam.
Pr (W/S) = probability that the specified word appears
in aspam message.
Pr (W/H) = probability that specified word appears in
hammessage.
It has complexity of O(n) (where n is total number of
observations).
Drawback of this classifier is that it uses strong
independence assumption between features but in real
world problem it may not be applicable.[2]
C. Decision Trees:
Decision tree builds classification models in the form
of a tree structure. It breaks down a dataset into smaller and
smaller subsets while at the same time an associated
decision tree is incrementally developed. The final resultisa
tree with decision nodes and leaf nodes. A decision node
has two branches spam or ham. Topmost node is root node
corresponds to most predictor. ID3 is a non-incremental
algorithm used to build a decision tree from a fixed set of
observations (dataset).The resulting tree is used to classify
test observation. Each observation is represented by its
features or attributes and a class to which it belongs. ID3
uses information gain measure to select decision node.
Information gain indicates the ability of a given attribute to
separate training examples into classes. Higher the
information gain, higher is the ability of the attribute to
separate trainingobservation.Informationgainuses entropy
as a measure to calculate the amount of uncertainty in
dataset.[2]
It builds fastest and short tree and considers attributes that
are enough to classify data. But it suffers from over-fitting
problem if training data is small.[2]
D. Support Vector Machine(SVM):
SVM is a of machine learning algorithmbased on
statistics learning theory. SVM is a kernelbased technique
widely used for classification, regressionand outlier
detection. The main reason of itsincreasing importance is
its ability to cast nonlinearclassification problem as a
quadratic problem (QP).Nowadaysthereisa development
of special purposealgorithm for solving QP. Sequential
minimaloptimization (SMO) has been used for faster
training ofSVM model.
Advantages of SMO are that it works effectively in
highdimensional space. It also gives good results
whennumbers of dimensions are greater thanthenumber
ofobservations. Also it is memory efficient.
Disadvantage of SMO is that it can’t handle large data
set & its time consuming for large dataset as input.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 492
Fig-1Hyper plane that separate the two classes
SVM Working:
1. Every mail instance is treated as a single point with
n dimensions in hyperspace.
2. The distance betweenthehyperplanesandpointsof
each class, is kept maximum, for good separation.
Here in fig-1, Plane1 is good classifier and Plane2
doesn’t classify all instances.
3. It may also happen that, we can’t find good
separator hyperplane (Plane 1) as in fig.1. In
suchcase, hyper space is called as non-linearly
separable.
4. To obtain linear separation in the non-linearly
separable hyperspace, it is extended to more
dimensions.[1]
3. PROPOSED SYSTEM
Proposed system contains following modules:
3.1 Admin Module
The admin has the complete right over the system. The
admin can view List of Valid and Invalid i.e spam mail.
 Add training E-mail:This Module will facilitate
Admin to specify sample e-mail content containing
valid or spam email. This email will be further
analyzed to store some training parametersasvalid
or spam email.
 Training Module:InthisModule,Adminwill specify
sample training data for valid and spam mails and
divide training set of emails into 2 random set of
emails on which training will be performed and
result will be displayed in list as valid email list or
spam email list. Training is provided based on
MapReduce with SVM model.
Fig.2 Architecture of proposed system
3.2 User Module
The user can be defined as an individual accessing the
application. User has the features of sending and
receiving the emails within the domain of app (local
domain) and viewing Inbox and Spam Emails
 Account authentication:Through this feature, a
user is authenticated into the system. The login
credentials of the user must match the information
stored in the application.On successful provisioning
of the credentials, a user is logged in the system
 Sending mails:The user of the application has the
right to send across a regular text email to a desired
receiver of the choice.
 Receiving Mails:The user of theapplicationhasthe
right to receive emails from other senders in the
application.
 View Inbox:The User of the applicationhas right to
view Inbox.
 View Spam:The User of the applicationhas right to
view Spam.
 Spam Analysis Detection: Whenever a user sends
across an email, it passes through the Spam
Analysis detection stage which is a vital module
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 493
used for spamDetection based on which the system
will generate a list containing Valid and Spam mails
from current user.
3.3 Spam Analysis based on MapReduce with
SVM model
In this paper, a M/R based distributed SVM
algorithm for scalable spam filter training, designated
MRSMO, is presented.Bydistributingandprocessingsubsets
of the training data across multiple participating computing
nodes, the distributed SVM reduces spam filter trainingtime
significantly. This approach minimizes memory
requirements drastically to store the matrix of input mail
instances. With MapReduce large input E-mail file for
training SVM is spited into small size data chunks.Inputdata
chunks are treated as Key (K) and Value (V). In case of spam
filtering, Label(Spam/Ham), Weight(highlyweightedtop 10
keywords ) Each map task works withonesingledata chunk,
so numbers of data chunks are normallyequal tothenumber
of map tasks. Fig. 2 shows splitting of input file and working
of MapReduce paradigm. For each E-mail, each Maptask run
independently with serial SMO on their respective training
set. Output produced by map tasks is passed through
shufflers, combiners, sorters etc to group same Keys (K) at
near to each other. MapReduce gives output in the form of
{key, value} pair. The Reduce task has {Label, Value}pairsas
an input, generated by each Map task. Then it combines the
result of all Map tasks to get final output.[1,2]
Fig.3 MapReduce Architecture
4. COMPARISONOFCLASSIFICATIONALGORITHMS
WITH MAPREDUCE
Classification
Algorithm
Traditional Model
(Without
MapReduce)
With MapReduce
Model
SVM 1. Low Scalbility
2. More time
required to train
the filter
3. Works best
with small input
dataset.
1. High
Scalability
2. Speedup
training process 6
times as
compared to
traditional
approach.
3. Works best
with large
datasets also
Bayesian
Classification
1. Strong
independence
assumptions
between the
features limit
performance
accuracy
2. Low
Scalability
1. Higher spam
detection
accuracy&speed.
2. High
Scalability
Clustering 1. Computation
cost is very high
because need to
calculatedistance
of each query
instance to all
training sample
emails
2. Can’t handle
large dataset
1. Computation
cost is very low as
all instancestested
in parallel way
efficiently
2. High
Scalability using
datasets up to 1
million
instances.[3]
5. CONCLUSION
Traditional SVM without MapReduce gives low
scalability and require high computationtime aswell ascost.
MapReduce framework is designed to provide large
scalability, Speedup and high accuracy. In similar way,
Bayesian and Clustering algorithms performance increases
with MapReduce model. But, Decisiontreeshowslimitations
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 494
while implementing MapReduce due to its irregular nature.
Its performance depends on large number of attributes,
shows small performance gain. So, We conclude that SVM
with MapReduce framework gives most satisfied results in
spam filtering.
6. ACKNOWLEDGEMENT
I take this opportunity to express my intense gratitude and
deep regards to my Guide Prof. Kanchan Doke for her
guidance, monitoring and constant encouragement
throughout this paper.I would like to express deepest
appreciation towards all staff members of BVCOE, Navi
Mumbai.
7. REFERENCES
[1]“Spam Filtering Techniques & Map Reduce with SVM”,
Amol G. Kakade, Prashant K. Kharat, Anil Kumar Gupta,
Tarun Batra, Department of Information Technology,2014
Asia-Pacific Conference on Computer Aided System
Engineering (APCASE), 978-1-4799-4568-9/14/$31.00
©2014 IEEE.
[2] ”A Survey and Evaluation of Supervised Machine
Learning Techniques for Spam E-Mail Filtering”, Tarjani
Vyas, Payal Prajapati, & Somil Gadhwal, Institute Of
Technology, IEEE international conference on Electronics,
Computers & Communication Technologies, 978-1-4799-
608S-9/1S/$31.00©2015 IEEE
[3] ”A MapReduce-based k-Nearest Neighbor Approach for
Big Data Classification” , Jesus Maillo, Issac Tringuero,
Francisco Herrera, IEEE Trustcom/BigData/ISPA978-1-
4673-7952-6/15 $31.00 © 2015 IEEE
[4] “A MapReduce Implementation of C4.5 Decision Tree
Algorithm”, Wei Dai, Wei Ji,International Journal ofDatabase
theory and Application,Vol.7, No.1 (2014), pp.49-60
[5] “ Survey on Spam Filtering Techniques”, SaadatNazirova
, Institute of Information Technology of Azerbaijan National
Academy of Sciences, Communications and Network, 2011, 3,
153-160
[6] “A TAXONOMY OF EMAIL SPAM FILTERS”, HASAN
SHOJAA ALKAHTANI, PAUL GARDNER-STEPHEN, AND
ROBERT GOODWIN, Computer Science Department, College
of Computer Science and Information Technology, King
Faisal University, Saudi Arabia
[7] “Origin (Dynamic Blacklisting) Based Spammer
Detection and Spam Mail Filtering Approach”, Nikhil
Aggrawal, Shailendra Singh, Dept. of Computer Science,
IEEE international paper on computer & Networks, ISBN:
978-1-4673-9379-9 ©2016 IEEE
[8] “EVALUATION OF DECEPTIVE MAILS USING FILTERING
& WEKA”, Sujeet More, Ravi Kalkundri, IEEE Sponsored 2nd
International Conference on Innovations in Information
Embedded and Communication Systems ICIIECS’15, 978-1-
4799-6818-3/15/$31.00 © 2015 IEEE
[9] “Spam Filtering by Semantics-based Text
Classification “,Wei Hu, Jinglong Du, and Yongkang Xing, 8th
International Conference on Advanced Computational
Intelligence, Chiang Mai, Thailand; February 14-16, 2016,
978-1-4673-7782-9/16/$31.00 ©2016 IEEE
[10] “Spam Classification Based on Supervised Learning by
Machine Learning Techniques”, Ms. D. Karthika Renuka, Dr.
T. Hansapriya,Mr. M. Raja Chakravarthi, Ms. P. Lakshmi
Surya, IEEE international paper on Data mining, 978-1-
61284-764-1/11/$26.00 ©2011 IEEE

More Related Content

What's hot

IRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & AutomationIRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & Automation
IRJET Journal
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
journalBEEI
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
Editor IJCATR
 
Routing in delay tolerant network using
Routing in delay tolerant network usingRouting in delay tolerant network using
Routing in delay tolerant network usingIAEME Publication
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
Medicaps University
 
Internet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining TechniquesInternet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining Techniques
iosrjce
 
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion DetectionMulti Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
IJNSA Journal
 
Ad hoc networks filter based addressing protocol
Ad hoc networks filter based addressing protocolAd hoc networks filter based addressing protocol
Ad hoc networks filter based addressing protocol
eSAT Publishing House
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Medicaps University
 
Scheduling
SchedulingScheduling
Spam Detection Using Natural Language processing
Spam Detection Using Natural Language processingSpam Detection Using Natural Language processing
Spam Detection Using Natural Language processing
युनीक तुषार गुप्ता
 
A novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locationsA novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locations
iosrjce
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
IJNSA Journal
 
Distributed systems
Distributed systemsDistributed systems
Distributed systems
Mohammad Reza Gerami
 
Detecting Misbehavior Nodes Using Secured Delay Tolerant Network
Detecting Misbehavior Nodes Using Secured Delay Tolerant NetworkDetecting Misbehavior Nodes Using Secured Delay Tolerant Network
Detecting Misbehavior Nodes Using Secured Delay Tolerant Network
IRJET Journal
 
Dynamic Concept Drift Detection for Spam Email Filtering
Dynamic Concept Drift Detection for Spam Email FilteringDynamic Concept Drift Detection for Spam Email Filtering
Dynamic Concept Drift Detection for Spam Email Filtering
IDES Editor
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
Sudarsun Santhiappan
 
Clock synchronization
Clock synchronizationClock synchronization
Clock synchronization
Medicaps University
 

What's hot (18)

IRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & AutomationIRJET- Email Spam Detection & Automation
IRJET- Email Spam Detection & Automation
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
 
Routing in delay tolerant network using
Routing in delay tolerant network usingRouting in delay tolerant network using
Routing in delay tolerant network using
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Internet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining TechniquesInternet Worm Classification and Detection using Data Mining Techniques
Internet Worm Classification and Detection using Data Mining Techniques
 
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion DetectionMulti Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion Detection
 
Ad hoc networks filter based addressing protocol
Ad hoc networks filter based addressing protocolAd hoc networks filter based addressing protocol
Ad hoc networks filter based addressing protocol
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Scheduling
SchedulingScheduling
Scheduling
 
Spam Detection Using Natural Language processing
Spam Detection Using Natural Language processingSpam Detection Using Natural Language processing
Spam Detection Using Natural Language processing
 
A novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locationsA novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locations
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
Distributed systems
Distributed systemsDistributed systems
Distributed systems
 
Detecting Misbehavior Nodes Using Secured Delay Tolerant Network
Detecting Misbehavior Nodes Using Secured Delay Tolerant NetworkDetecting Misbehavior Nodes Using Secured Delay Tolerant Network
Detecting Misbehavior Nodes Using Secured Delay Tolerant Network
 
Dynamic Concept Drift Detection for Spam Email Filtering
Dynamic Concept Drift Detection for Spam Email FilteringDynamic Concept Drift Detection for Spam Email Filtering
Dynamic Concept Drift Detection for Spam Email Filtering
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
Clock synchronization
Clock synchronizationClock synchronization
Clock synchronization
 

Similar to A Survey on Spam Filtering Methods and Mapreduce with SVM

Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
IRJET Journal
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
IRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
IRJET Journal
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
IRJET Journal
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithm
IRJET Journal
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
IRJET Journal
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET Journal
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
E-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector MachineE-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector Machine
IRJET Journal
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
IRJET Journal
 
trialFinal report7th sem.pdf
trialFinal report7th sem.pdftrialFinal report7th sem.pdf
trialFinal report7th sem.pdf
UMAPATEL34
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
ijcax
 
Implementation of Spam Classifier using Naïve Bayes Algorithm
Implementation of Spam Classifier using Naïve Bayes AlgorithmImplementation of Spam Classifier using Naïve Bayes Algorithm
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET Journal
 

Similar to A Survey on Spam Filtering Methods and Mapreduce with SVM (20)

Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithm
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
E-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector MachineE-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector Machine
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
 
trialFinal report7th sem.pdf
trialFinal report7th sem.pdftrialFinal report7th sem.pdf
trialFinal report7th sem.pdf
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILRSPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
Implementation of Spam Classifier using Naïve Bayes Algorithm
Implementation of Spam Classifier using Naïve Bayes AlgorithmImplementation of Spam Classifier using Naïve Bayes Algorithm
Implementation of Spam Classifier using Naïve Bayes Algorithm
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 

Recently uploaded (20)

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 

A Survey on Spam Filtering Methods and Mapreduce with SVM

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 490 A Survey on Spam Filtering Methods and Mapreduce with SVM Dipika Somvanshi1, Prof. Kanchan Doke2 1 ME(Comp.) Student of Bharati Vidyapeeth College of Engineering, Kharghar, Navi Mumbai ,India 2Proffesor, Computer Dept., Bharati Vidyapeeth College of Engineering, Kharghar, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Spam isanyunwantedandharmful mail send to massive recipients in bulk quantity. Spam can be harmful as it may contain malware & links to phishing websites. So Separation of spam from normal mails in separate folder is essential. Techniquestoseparatespammails are word based, content based, machine learning based and hybrid. Machinelearningtechniquesaremostpopularbecause of high accuracy and mathematical support. This paper surveys different spam filtering techniques. SVMisthepopular machine learning techniques in spam filtering because it can handle data with large number of attributes. SVM requires more time to train the data and for training it can’t work with large a dataset, these drawbacks can be minimized by introducing MapReduce framework for SVM. MapReduce framework can work in parallel with input dataset file chunks to train SVM for time reduction. Thispaper aims at surveying of few such spam filtering techniques and scope to introduce MapReduce with SVM. Key Words: Spam Filtering; Machine Learning Techniques; Naïve Bays; KNN; Decision Trees; SVM; Mapreduce 1. INTRODUCTION The email system is one of the most used communication tools. Email is a quick means of communication because one has not to wait for the response and it is straightforward way to stay in touch with the all. One major threat to an email system is spam e-mail. The spam e-mail is nothing but the unwanted mail send in bulk quantitybyspammersgroup for their advantage. Spammers are group of peopleintended to spread malicious content, advertise content, Links to phishing websites through email. Spam Emails causes overloading of server bandwidth, storage, cost, time for separation of spam emails from ham E-mails. According to the SMX email security provider, the live spam percentageis about 79.5%. The average size of spam is 16 KB[1]. So classification of Emails in spam & ham is most important issue. For the separation of such spams from important mails, spam filtering is important. Various spam filtering techniques exist in literature survey. Spam filtering techniques are classifiedasMachineLearning based,Content based, List based, Hybrid. Amongst them Machine learning techniques give more accurate results due to their mathematical background. Machine learning techniques works with data mining algorithms and gives more satisfied results. For spam filtering filters are trained withalgorithms for sample data set of emails & then tested fornewsample of emails. Machine learning based spam classification algorithms are SVM, Naïve Bays, KNN, Decision Trees, etc. Amongst these, Naïve Bayesian classification and Support Vector Machine are most used and appreciated by researchers. Also, number of freeware and paid tools are available for spam filtering, they also makes use of these techniques. Support Vector Machines (SVM) can be applied efficiently in spam filtering. SVM works with Kernel function and gives most satisfied results in spam filtering.SVM works best with small set of data input. But it’s performance degrades with increase in size of dataset. It requires large time to train filter. So this issue needs to be addressed. MapReduce can be effectively used for training of large dataset input. It can process large data within lesstime.So in this proposed system we have used MapReduce withSVM to classify large set of emails into spam and ham. MapReduce with SVM gives more speedup than traditional SVM algorithm. 2. LITERATURE REVIEW OF MACHINE LEARNING TECHNIQUES A. Clustering: Clustering is used for separation of objects into relative classes called clusters. It classifies object or observations in such a way that objects in a group are more similarto each other than tothose in other group.[2] K- Nearest Neighbor: It is one of the simplest machine learning algorithms.KNN works with ‘characteristics vector’. The characteristic vectors are measure of similarities among all messages. In this algorithm new incoming email is classified on the basis
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 491 of distance of that mail from both classes. Distance can be calculated by Euclidian distance. KNN works as follows. It consists of two phases. Training phase: Storing the feature vectors & class labelsof training emails. Classification Phase: k is a user-defined constant, and an unlabeled email is classified by assigning the label which is most frequent among the k trainingemail samplesnearestto new incoming email. B. Bayesian Classification: Bayesian classification is based on Bays theorem. Bayesian classifiers can predict class membership probabilities.Membershipprobabilitiesarethemeasure of probability that given sample belongs to a particularclass. In case of spam classification, probability of words is calculated for spam and ham emails. Probability that particular word is occurred in spam emails & ham emails is calculated. Depending on that a training dictionary is generated. When new email is arrived, its words probability is calculated & if spam words probability is more than ham probability then it is classified as spam email else ham. Naive Bayes classifier calculates word probability as follows: ----------Eq. No. (1) Where, Pr(S/W) = probability that a message is a spam. Pr (W/S) = probability that the specified word appears in aspam message. Pr (W/H) = probability that specified word appears in hammessage. It has complexity of O(n) (where n is total number of observations). Drawback of this classifier is that it uses strong independence assumption between features but in real world problem it may not be applicable.[2] C. Decision Trees: Decision tree builds classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final resultisa tree with decision nodes and leaf nodes. A decision node has two branches spam or ham. Topmost node is root node corresponds to most predictor. ID3 is a non-incremental algorithm used to build a decision tree from a fixed set of observations (dataset).The resulting tree is used to classify test observation. Each observation is represented by its features or attributes and a class to which it belongs. ID3 uses information gain measure to select decision node. Information gain indicates the ability of a given attribute to separate training examples into classes. Higher the information gain, higher is the ability of the attribute to separate trainingobservation.Informationgainuses entropy as a measure to calculate the amount of uncertainty in dataset.[2] It builds fastest and short tree and considers attributes that are enough to classify data. But it suffers from over-fitting problem if training data is small.[2] D. Support Vector Machine(SVM): SVM is a of machine learning algorithmbased on statistics learning theory. SVM is a kernelbased technique widely used for classification, regressionand outlier detection. The main reason of itsincreasing importance is its ability to cast nonlinearclassification problem as a quadratic problem (QP).Nowadaysthereisa development of special purposealgorithm for solving QP. Sequential minimaloptimization (SMO) has been used for faster training ofSVM model. Advantages of SMO are that it works effectively in highdimensional space. It also gives good results whennumbers of dimensions are greater thanthenumber ofobservations. Also it is memory efficient. Disadvantage of SMO is that it can’t handle large data set & its time consuming for large dataset as input.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 492 Fig-1Hyper plane that separate the two classes SVM Working: 1. Every mail instance is treated as a single point with n dimensions in hyperspace. 2. The distance betweenthehyperplanesandpointsof each class, is kept maximum, for good separation. Here in fig-1, Plane1 is good classifier and Plane2 doesn’t classify all instances. 3. It may also happen that, we can’t find good separator hyperplane (Plane 1) as in fig.1. In suchcase, hyper space is called as non-linearly separable. 4. To obtain linear separation in the non-linearly separable hyperspace, it is extended to more dimensions.[1] 3. PROPOSED SYSTEM Proposed system contains following modules: 3.1 Admin Module The admin has the complete right over the system. The admin can view List of Valid and Invalid i.e spam mail.  Add training E-mail:This Module will facilitate Admin to specify sample e-mail content containing valid or spam email. This email will be further analyzed to store some training parametersasvalid or spam email.  Training Module:InthisModule,Adminwill specify sample training data for valid and spam mails and divide training set of emails into 2 random set of emails on which training will be performed and result will be displayed in list as valid email list or spam email list. Training is provided based on MapReduce with SVM model. Fig.2 Architecture of proposed system 3.2 User Module The user can be defined as an individual accessing the application. User has the features of sending and receiving the emails within the domain of app (local domain) and viewing Inbox and Spam Emails  Account authentication:Through this feature, a user is authenticated into the system. The login credentials of the user must match the information stored in the application.On successful provisioning of the credentials, a user is logged in the system  Sending mails:The user of the application has the right to send across a regular text email to a desired receiver of the choice.  Receiving Mails:The user of theapplicationhasthe right to receive emails from other senders in the application.  View Inbox:The User of the applicationhas right to view Inbox.  View Spam:The User of the applicationhas right to view Spam.  Spam Analysis Detection: Whenever a user sends across an email, it passes through the Spam Analysis detection stage which is a vital module
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 493 used for spamDetection based on which the system will generate a list containing Valid and Spam mails from current user. 3.3 Spam Analysis based on MapReduce with SVM model In this paper, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented.Bydistributingandprocessingsubsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter trainingtime significantly. This approach minimizes memory requirements drastically to store the matrix of input mail instances. With MapReduce large input E-mail file for training SVM is spited into small size data chunks.Inputdata chunks are treated as Key (K) and Value (V). In case of spam filtering, Label(Spam/Ham), Weight(highlyweightedtop 10 keywords ) Each map task works withonesingledata chunk, so numbers of data chunks are normallyequal tothenumber of map tasks. Fig. 2 shows splitting of input file and working of MapReduce paradigm. For each E-mail, each Maptask run independently with serial SMO on their respective training set. Output produced by map tasks is passed through shufflers, combiners, sorters etc to group same Keys (K) at near to each other. MapReduce gives output in the form of {key, value} pair. The Reduce task has {Label, Value}pairsas an input, generated by each Map task. Then it combines the result of all Map tasks to get final output.[1,2] Fig.3 MapReduce Architecture 4. COMPARISONOFCLASSIFICATIONALGORITHMS WITH MAPREDUCE Classification Algorithm Traditional Model (Without MapReduce) With MapReduce Model SVM 1. Low Scalbility 2. More time required to train the filter 3. Works best with small input dataset. 1. High Scalability 2. Speedup training process 6 times as compared to traditional approach. 3. Works best with large datasets also Bayesian Classification 1. Strong independence assumptions between the features limit performance accuracy 2. Low Scalability 1. Higher spam detection accuracy&speed. 2. High Scalability Clustering 1. Computation cost is very high because need to calculatedistance of each query instance to all training sample emails 2. Can’t handle large dataset 1. Computation cost is very low as all instancestested in parallel way efficiently 2. High Scalability using datasets up to 1 million instances.[3] 5. CONCLUSION Traditional SVM without MapReduce gives low scalability and require high computationtime aswell ascost. MapReduce framework is designed to provide large scalability, Speedup and high accuracy. In similar way, Bayesian and Clustering algorithms performance increases with MapReduce model. But, Decisiontreeshowslimitations
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 494 while implementing MapReduce due to its irregular nature. Its performance depends on large number of attributes, shows small performance gain. So, We conclude that SVM with MapReduce framework gives most satisfied results in spam filtering. 6. ACKNOWLEDGEMENT I take this opportunity to express my intense gratitude and deep regards to my Guide Prof. Kanchan Doke for her guidance, monitoring and constant encouragement throughout this paper.I would like to express deepest appreciation towards all staff members of BVCOE, Navi Mumbai. 7. REFERENCES [1]“Spam Filtering Techniques & Map Reduce with SVM”, Amol G. Kakade, Prashant K. Kharat, Anil Kumar Gupta, Tarun Batra, Department of Information Technology,2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE), 978-1-4799-4568-9/14/$31.00 ©2014 IEEE. [2] ”A Survey and Evaluation of Supervised Machine Learning Techniques for Spam E-Mail Filtering”, Tarjani Vyas, Payal Prajapati, & Somil Gadhwal, Institute Of Technology, IEEE international conference on Electronics, Computers & Communication Technologies, 978-1-4799- 608S-9/1S/$31.00©2015 IEEE [3] ”A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification” , Jesus Maillo, Issac Tringuero, Francisco Herrera, IEEE Trustcom/BigData/ISPA978-1- 4673-7952-6/15 $31.00 © 2015 IEEE [4] “A MapReduce Implementation of C4.5 Decision Tree Algorithm”, Wei Dai, Wei Ji,International Journal ofDatabase theory and Application,Vol.7, No.1 (2014), pp.49-60 [5] “ Survey on Spam Filtering Techniques”, SaadatNazirova , Institute of Information Technology of Azerbaijan National Academy of Sciences, Communications and Network, 2011, 3, 153-160 [6] “A TAXONOMY OF EMAIL SPAM FILTERS”, HASAN SHOJAA ALKAHTANI, PAUL GARDNER-STEPHEN, AND ROBERT GOODWIN, Computer Science Department, College of Computer Science and Information Technology, King Faisal University, Saudi Arabia [7] “Origin (Dynamic Blacklisting) Based Spammer Detection and Spam Mail Filtering Approach”, Nikhil Aggrawal, Shailendra Singh, Dept. of Computer Science, IEEE international paper on computer & Networks, ISBN: 978-1-4673-9379-9 ©2016 IEEE [8] “EVALUATION OF DECEPTIVE MAILS USING FILTERING & WEKA”, Sujeet More, Ravi Kalkundri, IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS’15, 978-1- 4799-6818-3/15/$31.00 © 2015 IEEE [9] “Spam Filtering by Semantics-based Text Classification “,Wei Hu, Jinglong Du, and Yongkang Xing, 8th International Conference on Advanced Computational Intelligence, Chiang Mai, Thailand; February 14-16, 2016, 978-1-4673-7782-9/16/$31.00 ©2016 IEEE [10] “Spam Classification Based on Supervised Learning by Machine Learning Techniques”, Ms. D. Karthika Renuka, Dr. T. Hansapriya,Mr. M. Raja Chakravarthi, Ms. P. Lakshmi Surya, IEEE international paper on Data mining, 978-1- 61284-764-1/11/$26.00 ©2011 IEEE