SlideShare a Scribd company logo
1 of 6
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 429
Overview of Anti-spam filtering Techniques
Sushma L.Wakchaure, Shailaja D.Pawar,Ganesh D.Ghuge ,Bipin B.Shinde
Amrutvahini Polytechnic, Sangamner
Professor, Dept. of Computer Engineering, Amrutvahini Polytechnic College, Maharashtra, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Electronic mail (E-mail) is an essential
communication tool that has been greatly abused by
spammers to disseminate unwanted information (messages)
and spread malicious contents to Internet users. Current
Internet technologies further accelerated the distribution of
spam. Spam-reduction techniqueshave developedrapidlyover
the last few years, as spam volumes have increased. Webelieve
that no one anti-spam solution is the “right” answer, and that
the best approach is a multifaceted one, combining various
forms of filtering with infrastructure changes, financial
changes, legal recourse, and more, to provide a stronger
barrier to spam than can be achieved with one solution alone.
Spam Guru addresses the part of this multi-faceted approach
that can be handled by technology on the recipient’s side,
using plug-in tokenizes and parsers, plug-in classification
modules, and machine-learning techniques to achieve high
hit rates and low false-positive rates. Effectivecontrolsneedto
be deployed to countermeasure the ever growing spam
problem. Machine learning provides better protective
mechanisms that are able to control spam. This paper
summarizes most common techniques used for anti-spam
filtering by analyzing the e-mail content and also looks into
machine learning algorithms such as Naïve Bayesian, support
vector machine and neural network thathavebeen adopted to
detect and control spam. Each machine learning has its own
strengths and limitations as such appropriate preprocessing
need to be carefully considered to increase the effectiveness of
any given machine learning.
Key Words: Anti-spam filters, text categorization,
electronic mail (E-mail), Spam Bayes, training email
filters, content filtering, false negatives, user levelspam
filtering machine learning,
1. INTRODUCTION
Spam-reduction techniques have developedrapidlyover the
last few years, as spam volumes have increased. We believe
that the spam problem requires a multi-facetedsolutionthat
combines a broad array of filtering techniques with various
infrastructural changes, changes in financial incentives for
spammers, legal approaches, and more [1]. This paper
describes one part of a more comprehensive anti-spam
research effort undertaken by us and our colleagues:
SpamGuru, a collaborative anti-spam filter that combines
several learning, tokenization,anduserinterfaceelementsto
provide enterprise-wide spam protection with high spam
detection rates and low false-positive rates. E-mail or
electronic mail is an electronic messaging system that
transmits messagesacrosscomputer networks.Userssimply
type in the message, add the recipient’s e-mail address (es)
and click the send button. Users can access any free e-mail
service such as Yahoo mail, Gmail, Hotmail, or register with
ISPs (Internet Service Providers)inordertoobtainan e-mail
account at no cost except for the Internet connection
charges. Besides that, e-mail can be also received almost
immediately by the recipient once it is sent out. E-mail
allows users to communicate with each otherata lowcostas
well as provides an efficient mail delivery system. The
reliability, user-friendliness and availability of a wide range
of free e-mail services make it most popular and a preferred
communication tool. As such, businesses and individual
users alike rely heavily on this communication tool to share
information and knowledge. Businesses can drastically cut
down on communication cost since e-mail is extremely fast
and inexpensive; furthermore it is a very powerful
marketing tool. Businesses can capitalize from this
technology since it is a very popular advertising tool.
However, the simplicity of *Corresponding author. E-mail:
alaa_taqa@um.edu.my. Sending e-mail and the almost non-
existent cost poses another problem: Spam. Spam refers to
bulk unsolicited commercial e-mail sent indiscriminately to
users. Table 1 enumerates some of them. BasedontheFerris
Research (2009), spamcanbecategorizedintothefollowing:
1. Health; such as fake pharmaceuticals; 2. Promotional
products; such as fake fashion items (for example, watches);
3. Adult content; such as pornography and prostitution; 4,
Financial and refinancing; such as stock kiting, tax solutions,
loan packages; 5. Phishing and otherfraud;suchas“Nigerian
419” and “Spanish Prisoner”; 6. Malware andviruses;Trojan
horses attempting to infect your PC with malware; 7.
Education; such as online diploma; 8. Marketing; such as
direct marketing material, sexual enhancement products; 9.
Political; US president votes.
2. E-MAIL STRUCTURE
E-mail messages are divided into 2 parts: Header
information and message body. Header information or the
header field consists of information about the message’s
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 430
transportation which generally shows the following
information; 1. From: displays sender’s detail such as e-mail
address; 2. To: displays receiver’s detail such as e-mail
address; 3. Date: displays the date the e-mail was sendtothe
recipient;4.Received:intermediaryserver’sinformationand
the date the e-mail message is processed; 5. Reply to: reply
address; 6. Subject: the subject of message specified by the
sender; 7. Message Id: unique id of the message and others
The message body contains the messageofthe e-mail.E-mail
messages are presented in plain text or HTML. An e-mail
may also have attachments such as graphics, video or other
format type and to facilitate these attachments MIME
(multipurpose internet mail extension) is used. SPAMMER
TRICKS In order to send spam, spammers first obtain e-mail
addresses by harvesting addresses through the Internet
using specialized software. This software systematically
gathers e-mail addressesfromdiscussiongroupsorwebsites
(Schaub, 2002), other than that spammer also able to
purchase or rent collections of e-mail addresses from other
spammers or services providers. Table 2 indicates the many
tricks used by spammers to avoid detection by spam filters.
SPA
Figure 1: Email Structure
3. SPAM CONTROL TECHNIQUES
Anti-spam techniques and methods try to tell apart a spam
email from legitimateemail. As a typical emailconsistsoffew
components such as the header, the body and attachments,
the algorithms that classify emails may use differentfeatures
of the mail components to make decision about them. Lot of
work has gone into finding solution to spam problem from
different dimensions and directions(Islam and Zhou, 2007,
Zhang et al., 2012, Xiao-weiand Zhong-feng,2012,Rajendran
and Pandey, 2012, du Toit and Kruger, 2012,Xiaoetal.,2010,
Wei et al., 2010, So Young and Shin Gak, 2008,Klonowskiand
Strumiński, 2008, Horie and Neville, 2008, Nhung and
Phuong, 2007, McGibney and Botvich, 2007, Liu et al., 2007,
Huai-bin et al., 2005, Moon et al., 2004, Wu andTsai)overthe
last decade. Various anti-spam solutions are available that
have been surveyed by many researchers(BlanzieriandBryl,
2008, Caruana andLi, 2012,Guzella andCaminhas,2009,Lai,
2007, Yu and Xu, 2008, Paswan et al., 2012, Nazirova, 2011).
Those are blacklists, whitelists, grey lists, content based
filtering, feature selection methods, bag-of-words, machine
learning techniques such as Naïve Bayes, Support vector
machines, artificial neural networks, lazy learning, etc),
reputation based techniques, artificial immune systems,
protocol based procedures,andsoon.(CaruanaandLi,2012)
also lists some emerging approaches such as per to peer
computing, grid computing, social networks and ontology
based semantics along with few other approaches. These
solutions can be grouped into various categories such as list
based techniques, and filtering techniques; another
categorisation can be prevention, detection and reaction
techniques(Nakulas et al., 2009). (Paswan et al., 2012)
categorises the email spam filtering techniques as origin
based spam filtering,contentbasedfiltering,featureselection
methods, feature extraction methods, and traffic based
filtering. The scopeof thispaperiscontentbasedfilteringand
in specificlearning based filters. Hence, wewould not gointo
detail of each of these solutions but limit ourselves to Bayes
algorithm. Spammers are insensitive to the consequences of
their activities and need to be dissuaded by being made to
pay by the internet service providers for the waste of
bandwidth occupied by unwanted spam blocked by the
servers. This would be a feasible deterrent to reduce spam.
To execute this, all service providers must act in CRPIT
Volume149 - Information Security201468unisonandagree
to get spammers to pay for spam inconvenience and servers
clean up.
Figure 2: Controlling of Spam mail
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 431
4. USER PREFERRED DOMAIN SPECIFIC TRAINING
OF FILTER
In this work, we have been able to identity different types of
anti–spam techniques exemplified in the use of filters and
other characteristic means to deter spammers. Although
these anti-spam techniques may be suitable to some users
and unsuitable to others, they achieve some level of
protection against unwanted email messages. Each of these
anti-spam techniques has its unique feature that
distinguishes it from the rest of othersalthoughnoneofthese
are able to perfectly and substantially produce zero false
positives and zero false negatives or totally able to stop all
real-timeand potential spam.The main reason forthisisthat
spammers are always evolving new tricks to deceive the
filters. Some well-designed filters (for example, Bayesian
Filters) work very well getting success rate as good as 98-
99% at certain stages. But this number does not stay the
same. Spammers are able to vary these with ease. In this
paper, we are verifying that the filtering mechanisms
capability can be enhanced by domain specific training and
incorporating user preferences. This enhancement of the
filtering capability can increase the performance of the filter.
We are trying to build over the fact that, there is no
correlation between the receiving user’s area of interest and
content of spam email. Many researchers have studied the
content of Spam messages and various categorisations have
been published. One categorisation on the basis of content
type is Scams, Adult, Financial, Pharmaceuticals, stock,
phishing,diploma,softwareMalware,gambling,datingandso
on(SpiderLabs, 2013). Filtersare trained to identify spamon
the basis of these categories. But such training by the filter
would look for features related to any of those categories
generally in any email received. The training dataset would
contain features from all of these categories and the features
could be confusing for the filter as the spam training could
still work but the ham features are anything other than these
categories. In fact, the cohort of emails in inboxes belonging
users in different domains would be different. For examples,
user that belongs to healthcare domain would receive
healthcare kind of emails as compared to a user who belongs
to real estate. Same email may be Ham to one user but Spam
to another user based upon their preferences. Users in
various Domains have difference preference of emails that
they would classify as Spam or Ham. For example, an email
from a bookseller trying to sell books on Computer Science
would be Spam for a Pharmacist. The interesting question
that comes out of this is that how do the filters know which
emails are Spam and whicharehamfortheparticularuser.Of
course in somemail clients suchas gmail there isanoptionof
user preference setting where user can be given an option to
choose the topic area of interest and then the filter can use
that information to classify incoming emails accordingly.
There is very little work gone into the area of considering
specific user preferences while designing anti-spam filters.
(Kim et al., 2007) constructs a user preference ontology on
the basis of user profile and user actions and trains the filter
on the basis of that ontology. (Kim et al., 2006) suggest user
action based adaptive learning where they attach weights to
Bayesian classification on the basis of user actions. However
none of the work address user belonging to a particular
domain and their preferences accordingly. The case of
training the filter at the client by the client data is not new;
any user who would install SpamBayes would train it with
the training data(Meyer and Whateley, 2004). An
organisation that uses SpamBayes to filter incoming email
would for multiple users would retrain the filter on all
received email(Nelson et al., 2009). Training of filter with
different feature selection methods is also addressed in
(Gomez and Moens, 2010). The novelty in this case isthatthe
data we used for training is the user preferred data carefully
collected fora period of 5 years. Second important point that
affects the training is that the data collected belongs to a
particular user belonging to a specific domain not a general
user. This means that we are training the filter that if there is
no correlation between the receivers’ domain area and the
email content, the email is not wanted by the user. User
belonging to an educational organisation would have
different preferences as compared to a user belonging to a
marketing organisation. Different users within an
organisation would have different preferences and same
message could be classified as spam by different users. The
organisational filters cannot take care of such user
preferences. Hence, such emails end up in user inboxes as
false negatives. The dataset also takes intoaccount such user
preferences. The filter is trained on the basis of collection
Spam and Ham emails classified by the user belonging to a
particular domain. We made a hypothesis that domain
specific user preference training of thefilterreducesthefalse
negatives in the user inbox. To justify this hypothesis we
chose the spam filtering tool called Spam Bayes, installed it
on the outlook mailclient and trained it with domain specific
user preferred data. The next two sections give detailsonthe
background of the tool and the experiments done using the
datasets.
Figure 3: Spam Volume
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 432
3. CONCLUSIONS
Email spam has been the focus of studies for a long time.
Though there are many different techniques to block spam
email messages to reach users inbox, filtering is the most
commonly used mechanism and has gained success to some
extent. Given the large number of usage of email worldwide,
email spam is still plentiful and scale of the problem is
enormous. Researchers and organisations make the filers
smart and self-learning but spammersarea stepahead.They
keep on finding techniques to deceive the filters and their
learning mechanisms. Hence, the problem still remains
giving scope for researchers to work inthearea.Thiswork is
an effort in the same scope to reducefalsenegatives/spam in
the inbox of the users which has deceived the organisational
filters. It is observed that this further filteringbytrainingthe
filer with user specific data did make a difference in the
amount of false positives. Future work involves creating the
feature setsincludingcreatingdomainspecific keywords and
list of organisations which can be fed tothefilter,conducting
experiments and then observing the results to record the
improvements. Spam is becoming one of the most annoying
and malicious additions to Internet technology. Traditional
spam filter software are unable to cope with vast volumesof
spam that slip past anti-spam defenses. As spam problems
escalate, effective and efficient tools are required to control
them. Machine learning approaches have provided
researchers with a better way to combat spam. Machine
learning has been successfully applied in text classification.
Since e-mail contains text, the ML approach can be
seamlessly applied to classified spam.
5.EXECUTIVE SUMMARY
 "E-mail is the single most important tool for
business communication."
 For many organizations, when their e-mail stops,
their ability to conduct business stops too.
 About 80% of the intellectual property of a typical
company passes through its e-mail server.
 There's a 72% chance of an e-mail failure in any
company each year, lasting an average of 62 hours.
 E-mail is set to grow by 68% in the next 5 years.
And legal discovery is a growing consideration -
making e-mail management more important.
 The indirect costs of e-mail - mainly loss of
productive time - are likely to be overlooked, even
though they can be high.
 E-mail costs are virtually impossible to generalize,
because one is comparing apples with pears with
oranges.
 Stand-alone e-mail is inexpensive and simple. The
disadvantages of free services (e.g. Gmail, hotmail)
for businesses outweigh the benefitofthesmall cost
savings.
 Collaborative e-mail is typified by Microsoft
Exchange, and includes communications features
such as shared calendars, tasks and contacts;
smartphone and Outlook integration; and central
management and storage.
 Collaborative e-mail is relatively expensive.
 Exchange servers can be maintained in-house or
housed in a data centre.
 Shared hosted Exchange means renting space on a
hosted server that includes many other accounts.
 Cost comparisons for hosted vs in-house Exchange
are often published by Exchange hosting providers,
and tend to be distorted. Caveat emptor!
 Shared hosted Exchange can be considerably more
expensive over time than in-house, which also
allows control and flexibility.
 Hosted Exchange (shared or not) is an excellent
option for businesses with branches in different
geographic locations, and shared hosted Exchange
is good for very small companies that can't justify
the cost of an in-house server.
 Google Apps is an inexpensive alternative to
Exchange, but has several disadvantages and
doesn't work as well.
Figure 5:Anti-spam filtering system
6. SPAM-FILTERING METHODS
Blacklist
This popular spam-filtering method attempts to stop
unwanted email by blocking messages from a preset list of
senders that you or your organization’s system
administrator create. Blacklists are records of email
addresses or Internet Protocol (IP)addressesthathavebeen
previously used to send spam. When an incoming message
arrives, the spam filter checks to see if its IPoremail address
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 433
is on the blacklist; if so, the message is considered spam and
rejected.Though blacklists ensure that known spammers
cannot reach users' inboxes, they can also misidentify
legitimate senders as spammers. These so-called false
positives can result if a spammer happenstobesendingjunk
mail from an IP address that is also used by legitimate email
users. Also, since many clever spammers routinely switch IP
addresses and email addresses to cover their tracks, a
blacklist may not immediately catch the newest outbreaks.
Real-Time Blackhole List
This spam-filtering method works almost identically to a
traditional blacklist but requireslesshands-onmaintenance.
That’s because most real-time blackholelistsaremaintained
by third parties, who take the time to build comprehensive
blacklists on the behalf of their subscribers. Your filter
simply has to connect to the third-party system eachtimean
email comes in, to compare the sender’s IP address against
the list.Since blackhole lists are large and frequently
maintained, your organization's IT staff won't have tospend
time manually adding new IP addressestothelist,increasing
the chances that the filter will catch the newest junk-mail
outbreaks. But like blacklists, real-time blackhole lists can
also generate false positives if spammers happen to use a
legitimate IP address as a conduit for junk mail. Also, since
the list is likely to be maintained by a third party, you have
less control over what addresses are on — or not on — the
list.
Whitelist
A whitelist blocks spam using a system almost exactly
opposite to that of a blacklist. Rather than lettingyouspecify
which senders to block mail from, a whitelistletsyouspecify
which senders to allow mail from; these addresses are
placed on a trusted-users list. Most spam filters let you use a
whitelist in addition to another spam-fighting feature as a
way to cut down on the number of legitimate messages that
accidentally get flagged as spam.However,usinga verystrict
filter that only uses a whitelist would mean that anyonewho
was not approved would automatically be blocked.Some
anti-spam applications use a variation of this system known
as an automatic whitelist. In this system, an unknown
sender's email address is checked against a database; if they
have no history of spamming, their message is sent to the
recipient's inbox and they are added to the whitelist.
Greylist
A relatively new spam-filtering technique, greylists take
advantage of the fact that many spammers only attempt to
send a batch of junk mail once. Under thegreylistsystem, the
receiving mail server initially rejects messages from
unknown users and sends a failure message to the
originating server. If the mail server attempts to send the
message a second time — a step most legitimate servers will
take — the greylist assumes the message is not spam and
lets it proceed to the recipient's inbox. At this point, the
greylist filter will add the recipient's email or IP address toa
list of allowed senders.Though greylist filters require fewer
system resources than some other types of spamfilters,they
also may delay mail delivery, which could be inconvenient
when you are expecting time-sensitive messages.
Content-Based Filters
Rather than enforcing across-the-board policies for all
messages from a particular email or IP address, content-
based filters evaluate words or phrases found in each
individual message to determine whether an email is spam
or legitimate.
Word-Based Filters
A word-based spam filter is the simplest type of content-
based filter. Generally speaking, word-based filters simply
block any email that contains certainterms.Sincemanyspam
messages contain terms not often found in personal or
business communications, word filters can be a simple yet
capable technique for fighting junk email. However, if
configured to block messages containing more common
words, these types of filters may generatefalsepositives. For
instance, if the filter has been set to stop all messages
containing the word "discount," emails from legitimate
senders offering your nonprofit hardware or software at a
reduced price may not reach their destination.Alsonotethat
since spammers often purposefully misspell keywords in
order to evade word-based filters, your IT staff will need to
make time to routinely update the filter's list of blocked
words.
Heuristic Filters
Heuristic (or rule-based) filters take things a step beyond
simple word-based filters. Rather than blocking messages
that contain a suspicious word, heuristicfilterstakemultiple
terms found in an email into consideration.Heuristic filters
scan the contents of incoming emails and assigningpoints to
words or phrases. Suspicious words that are commonly
found in spam messages, such as "Rolex" or "Viagra,"receive
higher points, while termsfrequentlyfoundinnormal emails
receive lower scores. The filter then adds up all the points
and calculates a total score. If the message receives a certain
score or higher (determined by the anti-spam application's
administrator), the filter identifies it as spam and blocks it.
Messages that score lower than the target number are
delivered to the user.Heuristic filters work fast —
minimizing email delay — and are quite effective as soon as
they have been installed and configured. However, heuristic
filters configured to be aggressive may generate false
positives if a legitimate contact happens to send an email
containing a certain combination of words. Similarly, some
savvy spammers might learn which words to avoid
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 434
including, thereby fooling the heuristic filter into believing
they are benign senders.
Bayesian Filters
Bayesian filters, considered the most advanced form of
content-based filtering, employ the laws of mathematical
probability to determine which messages are legitimateand
which are spam. In order for a Bayesian filter to effectively
block spam, the end user must initially "train" itby manually
flagging each message as eitherjunk orlegitimate. Overtime,
the filter takes words and phrases found inlegitimate emails
and adds them to a list; it does the same with terms found in
spam.To determine which incoming messages are classified
as spam, the Bayesian filter scans the contents of the email
and then compares the text against its two-word lists to
calculate the probability that the message is spam. For
instance, if the word "valium"hasappeared62timesinspam
messages list but only three times in legitimate emails,there
is a 95 percent chance that an incoming email containing the
word "valium" is junk.Because a Bayesian filter is constantly
building its word list based on the messages that an
individual user receives, it theoretically becomes more
effective the longer it's used. However, since this method
does require a training period before it starts working well,
you will need to exercise patience and will probably have to
manually delete a few junk messages, at least at first.
ACKNOWLEDGEMENT
I would like to give my sincere thanks to my thanks go to all
the people who have supported me to completetheresearch
work directly or indirectly. I am extremely grateful to my
parents for their love, prayers, caring and sacrifices for
educating and preparing me for my future.
REFERENCES
[1] B. Leiba and N. Borenstein. A Multi-Faceted Approach to
Spam Prevention, Proceedings of the First Conference on E-
mail and Anti-Spam, 2004.
[2] M. Leonard, M. Rodriguez, R. Segal, and R. Shoop.
Managing Customer Opt-Outs in a Complex Global
Environment. Proceedings of the First Conference on E-mail
and Anti-Spam, 2004.
[3]P. Graham, A Plan for Spam.
http://paulgraham.com/spam.html, August 2002.
[4]P. Graham, Better Bayesian Filtering.
http://paulgraham.com/better.html, January 2003.
[5] P. Capek, B. Leiba, and M. N. Wegman. Charity Begins at
your Mail Server,
http://www.research.ibm.com/people/w/wegman/charit
y.htm, 2004.
[6] J. Lyon and M. Wong. MTA Authentication Records in
DNS, Internet Draft,
http://www.ietf.org/internetdrafts/draft-ietf-marid-core-
01.txt, June 2004
[7] S. Schleimer, D. Wilkerson, and A. Aiken, Winnowing:
local algorithms for documentfingerprinting.InProceedings
of SIGMOD 2003, San Diego CA, June 9-12, 2003.
[8] T. Zhang and F. J. Oles. Text categorization based on
regularized linear classification methods. Information
Retrieval, 4:5-31, 2001.
[9] CLIFFORD, M., FAIGIN, D., BISHOP, M. & BRUTCH, T.
Miracle Cures and Toner Cartridges: FindingSolutionsto the
Spam Problem. 19th annual computer security applications
conference (ACSAC 2003), 2003.
[10] DEEPAK, P. & SANDEEP, P. Spam filtering using spam
mail communities. In: SANDEEP, P., ed. Applications and
the Internet, 2005. Proceedings. The 2005 Symposium on,
2005.
[11] NAGAMALAI, D., DHINAKARAN, C. & LEE, J. K. Multi
Layer Approach to Defend DDoS Attacks Caused by Spam.
In: DHINAKARAN, C., ed. Multimedia and Ubiquitous
Engineering, 2007. MUE '07. International Conference on,
2007.
[12] Ferris Reseacrh (2009). Spam, Spammers, and Spam
Control A White Paper by Ferris Research (March 2009).
[13] Pang X, Feng Y, Jiang W (2007). A Chinese Anti-Spam
Filter Approach Based on Support Vector Machine.
Management Science and Engineering, 2007. ICMSE 2007.
International Conference on, 20-22: 97-102, Aug. 2007.
[14] Rish I (2001). An empirical study of the naïve bayes
classifier
[15] JUNG, J. & EMIL, S. An Empirical Study of Spam Traffic
and the Use of DNS Black Lists. Internet Measurement
Conference,, October 2004 Taormina, Italy.
[16] G. Sakkis, I. Androutsopoulos, G. Paliouras, V.
Karkaletsis, C. D. Spyropoulos and P. Stamatopoulos,
Stacking Classifiers for Anti-Spam Filtering of E-Mail, In
Proceedings of EMNLP-01, 6th Conference on Empirical
Methods in Natural Language Processing, 2001.
[17] P. Pantel and D. Lin. SpamCop -- A Spam Classification&
Organization Program.InProceedingsofAAAI-98Workshop
on Learning for Text Categorization, 1998.
[18] I. Rigoutsos and T. Huynh. Chung-Kwei: a pattern-
discovery-based system for the automatic identification of
unsolicited e-mail messages (SPAM).ProceedingsoftheFirst
Conference on E-mail and Anti-Spam, 2004.

More Related Content

What's hot

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1Dhara Shah
 
How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works Pinpointe On-Demand
 
Thematic and self learning method for
Thematic and self learning method forThematic and self learning method for
Thematic and self learning method forijcsa
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentationnewsan2001
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal
 
Internet service providers responsibilities in botnet mitigation: a Nigerian ...
Internet service providers responsibilities in botnet mitigation: a Nigerian ...Internet service providers responsibilities in botnet mitigation: a Nigerian ...
Internet service providers responsibilities in botnet mitigation: a Nigerian ...IJECEIAES
 
Machine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron DatasetMachine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron DatasetAman Singhla
 
An Improved Method for Preventing Data Leakage in an Organization
An Improved Method for Preventing Data Leakage in an OrganizationAn Improved Method for Preventing Data Leakage in an Organization
An Improved Method for Preventing Data Leakage in an OrganizationIJERA Editor
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYIJNSA Journal
 
Processing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithmProcessing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithmijcsit
 
CERT STRATEGY TO DEAL WITH PHISHING ATTACKS
CERT STRATEGY TO DEAL WITH PHISHING ATTACKSCERT STRATEGY TO DEAL WITH PHISHING ATTACKS
CERT STRATEGY TO DEAL WITH PHISHING ATTACKScsandit
 

What's hot (20)

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works
 
Thematic and self learning method for
Thematic and self learning method forThematic and self learning method for
Thematic and self learning method for
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
Internet service providers responsibilities in botnet mitigation: a Nigerian ...
Internet service providers responsibilities in botnet mitigation: a Nigerian ...Internet service providers responsibilities in botnet mitigation: a Nigerian ...
Internet service providers responsibilities in botnet mitigation: a Nigerian ...
 
Machine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron DatasetMachine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron Dataset
 
Spam Detection Using Natural Language processing
Spam Detection Using Natural Language processingSpam Detection Using Natural Language processing
Spam Detection Using Natural Language processing
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
An Improved Method for Preventing Data Leakage in an Organization
An Improved Method for Preventing Data Leakage in an OrganizationAn Improved Method for Preventing Data Leakage in an Organization
An Improved Method for Preventing Data Leakage in an Organization
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
 
Processing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithmProcessing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithm
 
CERT STRATEGY TO DEAL WITH PHISHING ATTACKS
CERT STRATEGY TO DEAL WITH PHISHING ATTACKSCERT STRATEGY TO DEAL WITH PHISHING ATTACKS
CERT STRATEGY TO DEAL WITH PHISHING ATTACKS
 

Similar to Overview of Anti-spam Filtering Techniques Using Machine Learning

EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMIRJET Journal
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsIRJET Journal
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGHeather Strinden
 
E-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector MachineE-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector MachineIRJET Journal
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningIRJET Journal
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET Journal
 
IRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET Journal
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemcsandit
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmIRJET Journal
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
VOICE BASED EMAIL SYSTEM
VOICE BASED EMAIL SYSTEMVOICE BASED EMAIL SYSTEM
VOICE BASED EMAIL SYSTEMIRJET Journal
 
CNS ANTI SPAM TECHNIQUES.pptx
CNS ANTI SPAM TECHNIQUES.pptxCNS ANTI SPAM TECHNIQUES.pptx
CNS ANTI SPAM TECHNIQUES.pptxSanjuSanjay40
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
IRJET- A Survey on Automatic Phishing Email Detection using Natural Langu...
IRJET-  	  A Survey on Automatic Phishing Email Detection using Natural Langu...IRJET-  	  A Survey on Automatic Phishing Email Detection using Natural Langu...
IRJET- A Survey on Automatic Phishing Email Detection using Natural Langu...IRJET Journal
 
Intelligent Spam Mail Detection System
Intelligent Spam Mail Detection SystemIntelligent Spam Mail Detection System
Intelligent Spam Mail Detection SystemIRJET Journal
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharyasankhadeep
 

Similar to Overview of Anti-spam Filtering Techniques Using Machine Learning (20)

EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
 
E-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector MachineE-Mail Spam Detection Using Supportive Vector Machine
E-Mail Spam Detection Using Supportive Vector Machine
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
 
IRJET- Twitter Spammer Detection
IRJET- Twitter Spammer DetectionIRJET- Twitter Spammer Detection
IRJET- Twitter Spammer Detection
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection system
 
Detecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithmDetecting spam mail using machine learning algorithm
Detecting spam mail using machine learning algorithm
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
VOICE BASED EMAIL SYSTEM
VOICE BASED EMAIL SYSTEMVOICE BASED EMAIL SYSTEM
VOICE BASED EMAIL SYSTEM
 
CNS ANTI SPAM TECHNIQUES.pptx
CNS ANTI SPAM TECHNIQUES.pptxCNS ANTI SPAM TECHNIQUES.pptx
CNS ANTI SPAM TECHNIQUES.pptx
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
IRJET- A Survey on Automatic Phishing Email Detection using Natural Langu...
IRJET-  	  A Survey on Automatic Phishing Email Detection using Natural Langu...IRJET-  	  A Survey on Automatic Phishing Email Detection using Natural Langu...
IRJET- A Survey on Automatic Phishing Email Detection using Natural Langu...
 
Intelligent Spam Mail Detection System
Intelligent Spam Mail Detection SystemIntelligent Spam Mail Detection System
Intelligent Spam Mail Detection System
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharya
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 

Recently uploaded (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 

Overview of Anti-spam Filtering Techniques Using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 429 Overview of Anti-spam filtering Techniques Sushma L.Wakchaure, Shailaja D.Pawar,Ganesh D.Ghuge ,Bipin B.Shinde Amrutvahini Polytechnic, Sangamner Professor, Dept. of Computer Engineering, Amrutvahini Polytechnic College, Maharashtra, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Electronic mail (E-mail) is an essential communication tool that has been greatly abused by spammers to disseminate unwanted information (messages) and spread malicious contents to Internet users. Current Internet technologies further accelerated the distribution of spam. Spam-reduction techniqueshave developedrapidlyover the last few years, as spam volumes have increased. Webelieve that no one anti-spam solution is the “right” answer, and that the best approach is a multifaceted one, combining various forms of filtering with infrastructure changes, financial changes, legal recourse, and more, to provide a stronger barrier to spam than can be achieved with one solution alone. Spam Guru addresses the part of this multi-faceted approach that can be handled by technology on the recipient’s side, using plug-in tokenizes and parsers, plug-in classification modules, and machine-learning techniques to achieve high hit rates and low false-positive rates. Effectivecontrolsneedto be deployed to countermeasure the ever growing spam problem. Machine learning provides better protective mechanisms that are able to control spam. This paper summarizes most common techniques used for anti-spam filtering by analyzing the e-mail content and also looks into machine learning algorithms such as Naïve Bayesian, support vector machine and neural network thathavebeen adopted to detect and control spam. Each machine learning has its own strengths and limitations as such appropriate preprocessing need to be carefully considered to increase the effectiveness of any given machine learning. Key Words: Anti-spam filters, text categorization, electronic mail (E-mail), Spam Bayes, training email filters, content filtering, false negatives, user levelspam filtering machine learning, 1. INTRODUCTION Spam-reduction techniques have developedrapidlyover the last few years, as spam volumes have increased. We believe that the spam problem requires a multi-facetedsolutionthat combines a broad array of filtering techniques with various infrastructural changes, changes in financial incentives for spammers, legal approaches, and more [1]. This paper describes one part of a more comprehensive anti-spam research effort undertaken by us and our colleagues: SpamGuru, a collaborative anti-spam filter that combines several learning, tokenization,anduserinterfaceelementsto provide enterprise-wide spam protection with high spam detection rates and low false-positive rates. E-mail or electronic mail is an electronic messaging system that transmits messagesacrosscomputer networks.Userssimply type in the message, add the recipient’s e-mail address (es) and click the send button. Users can access any free e-mail service such as Yahoo mail, Gmail, Hotmail, or register with ISPs (Internet Service Providers)inordertoobtainan e-mail account at no cost except for the Internet connection charges. Besides that, e-mail can be also received almost immediately by the recipient once it is sent out. E-mail allows users to communicate with each otherata lowcostas well as provides an efficient mail delivery system. The reliability, user-friendliness and availability of a wide range of free e-mail services make it most popular and a preferred communication tool. As such, businesses and individual users alike rely heavily on this communication tool to share information and knowledge. Businesses can drastically cut down on communication cost since e-mail is extremely fast and inexpensive; furthermore it is a very powerful marketing tool. Businesses can capitalize from this technology since it is a very popular advertising tool. However, the simplicity of *Corresponding author. E-mail: alaa_taqa@um.edu.my. Sending e-mail and the almost non- existent cost poses another problem: Spam. Spam refers to bulk unsolicited commercial e-mail sent indiscriminately to users. Table 1 enumerates some of them. BasedontheFerris Research (2009), spamcanbecategorizedintothefollowing: 1. Health; such as fake pharmaceuticals; 2. Promotional products; such as fake fashion items (for example, watches); 3. Adult content; such as pornography and prostitution; 4, Financial and refinancing; such as stock kiting, tax solutions, loan packages; 5. Phishing and otherfraud;suchas“Nigerian 419” and “Spanish Prisoner”; 6. Malware andviruses;Trojan horses attempting to infect your PC with malware; 7. Education; such as online diploma; 8. Marketing; such as direct marketing material, sexual enhancement products; 9. Political; US president votes. 2. E-MAIL STRUCTURE E-mail messages are divided into 2 parts: Header information and message body. Header information or the header field consists of information about the message’s
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 430 transportation which generally shows the following information; 1. From: displays sender’s detail such as e-mail address; 2. To: displays receiver’s detail such as e-mail address; 3. Date: displays the date the e-mail was sendtothe recipient;4.Received:intermediaryserver’sinformationand the date the e-mail message is processed; 5. Reply to: reply address; 6. Subject: the subject of message specified by the sender; 7. Message Id: unique id of the message and others The message body contains the messageofthe e-mail.E-mail messages are presented in plain text or HTML. An e-mail may also have attachments such as graphics, video or other format type and to facilitate these attachments MIME (multipurpose internet mail extension) is used. SPAMMER TRICKS In order to send spam, spammers first obtain e-mail addresses by harvesting addresses through the Internet using specialized software. This software systematically gathers e-mail addressesfromdiscussiongroupsorwebsites (Schaub, 2002), other than that spammer also able to purchase or rent collections of e-mail addresses from other spammers or services providers. Table 2 indicates the many tricks used by spammers to avoid detection by spam filters. SPA Figure 1: Email Structure 3. SPAM CONTROL TECHNIQUES Anti-spam techniques and methods try to tell apart a spam email from legitimateemail. As a typical emailconsistsoffew components such as the header, the body and attachments, the algorithms that classify emails may use differentfeatures of the mail components to make decision about them. Lot of work has gone into finding solution to spam problem from different dimensions and directions(Islam and Zhou, 2007, Zhang et al., 2012, Xiao-weiand Zhong-feng,2012,Rajendran and Pandey, 2012, du Toit and Kruger, 2012,Xiaoetal.,2010, Wei et al., 2010, So Young and Shin Gak, 2008,Klonowskiand Strumiński, 2008, Horie and Neville, 2008, Nhung and Phuong, 2007, McGibney and Botvich, 2007, Liu et al., 2007, Huai-bin et al., 2005, Moon et al., 2004, Wu andTsai)overthe last decade. Various anti-spam solutions are available that have been surveyed by many researchers(BlanzieriandBryl, 2008, Caruana andLi, 2012,Guzella andCaminhas,2009,Lai, 2007, Yu and Xu, 2008, Paswan et al., 2012, Nazirova, 2011). Those are blacklists, whitelists, grey lists, content based filtering, feature selection methods, bag-of-words, machine learning techniques such as Naïve Bayes, Support vector machines, artificial neural networks, lazy learning, etc), reputation based techniques, artificial immune systems, protocol based procedures,andsoon.(CaruanaandLi,2012) also lists some emerging approaches such as per to peer computing, grid computing, social networks and ontology based semantics along with few other approaches. These solutions can be grouped into various categories such as list based techniques, and filtering techniques; another categorisation can be prevention, detection and reaction techniques(Nakulas et al., 2009). (Paswan et al., 2012) categorises the email spam filtering techniques as origin based spam filtering,contentbasedfiltering,featureselection methods, feature extraction methods, and traffic based filtering. The scopeof thispaperiscontentbasedfilteringand in specificlearning based filters. Hence, wewould not gointo detail of each of these solutions but limit ourselves to Bayes algorithm. Spammers are insensitive to the consequences of their activities and need to be dissuaded by being made to pay by the internet service providers for the waste of bandwidth occupied by unwanted spam blocked by the servers. This would be a feasible deterrent to reduce spam. To execute this, all service providers must act in CRPIT Volume149 - Information Security201468unisonandagree to get spammers to pay for spam inconvenience and servers clean up. Figure 2: Controlling of Spam mail
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 431 4. USER PREFERRED DOMAIN SPECIFIC TRAINING OF FILTER In this work, we have been able to identity different types of anti–spam techniques exemplified in the use of filters and other characteristic means to deter spammers. Although these anti-spam techniques may be suitable to some users and unsuitable to others, they achieve some level of protection against unwanted email messages. Each of these anti-spam techniques has its unique feature that distinguishes it from the rest of othersalthoughnoneofthese are able to perfectly and substantially produce zero false positives and zero false negatives or totally able to stop all real-timeand potential spam.The main reason forthisisthat spammers are always evolving new tricks to deceive the filters. Some well-designed filters (for example, Bayesian Filters) work very well getting success rate as good as 98- 99% at certain stages. But this number does not stay the same. Spammers are able to vary these with ease. In this paper, we are verifying that the filtering mechanisms capability can be enhanced by domain specific training and incorporating user preferences. This enhancement of the filtering capability can increase the performance of the filter. We are trying to build over the fact that, there is no correlation between the receiving user’s area of interest and content of spam email. Many researchers have studied the content of Spam messages and various categorisations have been published. One categorisation on the basis of content type is Scams, Adult, Financial, Pharmaceuticals, stock, phishing,diploma,softwareMalware,gambling,datingandso on(SpiderLabs, 2013). Filtersare trained to identify spamon the basis of these categories. But such training by the filter would look for features related to any of those categories generally in any email received. The training dataset would contain features from all of these categories and the features could be confusing for the filter as the spam training could still work but the ham features are anything other than these categories. In fact, the cohort of emails in inboxes belonging users in different domains would be different. For examples, user that belongs to healthcare domain would receive healthcare kind of emails as compared to a user who belongs to real estate. Same email may be Ham to one user but Spam to another user based upon their preferences. Users in various Domains have difference preference of emails that they would classify as Spam or Ham. For example, an email from a bookseller trying to sell books on Computer Science would be Spam for a Pharmacist. The interesting question that comes out of this is that how do the filters know which emails are Spam and whicharehamfortheparticularuser.Of course in somemail clients suchas gmail there isanoptionof user preference setting where user can be given an option to choose the topic area of interest and then the filter can use that information to classify incoming emails accordingly. There is very little work gone into the area of considering specific user preferences while designing anti-spam filters. (Kim et al., 2007) constructs a user preference ontology on the basis of user profile and user actions and trains the filter on the basis of that ontology. (Kim et al., 2006) suggest user action based adaptive learning where they attach weights to Bayesian classification on the basis of user actions. However none of the work address user belonging to a particular domain and their preferences accordingly. The case of training the filter at the client by the client data is not new; any user who would install SpamBayes would train it with the training data(Meyer and Whateley, 2004). An organisation that uses SpamBayes to filter incoming email would for multiple users would retrain the filter on all received email(Nelson et al., 2009). Training of filter with different feature selection methods is also addressed in (Gomez and Moens, 2010). The novelty in this case isthatthe data we used for training is the user preferred data carefully collected fora period of 5 years. Second important point that affects the training is that the data collected belongs to a particular user belonging to a specific domain not a general user. This means that we are training the filter that if there is no correlation between the receivers’ domain area and the email content, the email is not wanted by the user. User belonging to an educational organisation would have different preferences as compared to a user belonging to a marketing organisation. Different users within an organisation would have different preferences and same message could be classified as spam by different users. The organisational filters cannot take care of such user preferences. Hence, such emails end up in user inboxes as false negatives. The dataset also takes intoaccount such user preferences. The filter is trained on the basis of collection Spam and Ham emails classified by the user belonging to a particular domain. We made a hypothesis that domain specific user preference training of thefilterreducesthefalse negatives in the user inbox. To justify this hypothesis we chose the spam filtering tool called Spam Bayes, installed it on the outlook mailclient and trained it with domain specific user preferred data. The next two sections give detailsonthe background of the tool and the experiments done using the datasets. Figure 3: Spam Volume
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 432 3. CONCLUSIONS Email spam has been the focus of studies for a long time. Though there are many different techniques to block spam email messages to reach users inbox, filtering is the most commonly used mechanism and has gained success to some extent. Given the large number of usage of email worldwide, email spam is still plentiful and scale of the problem is enormous. Researchers and organisations make the filers smart and self-learning but spammersarea stepahead.They keep on finding techniques to deceive the filters and their learning mechanisms. Hence, the problem still remains giving scope for researchers to work inthearea.Thiswork is an effort in the same scope to reducefalsenegatives/spam in the inbox of the users which has deceived the organisational filters. It is observed that this further filteringbytrainingthe filer with user specific data did make a difference in the amount of false positives. Future work involves creating the feature setsincludingcreatingdomainspecific keywords and list of organisations which can be fed tothefilter,conducting experiments and then observing the results to record the improvements. Spam is becoming one of the most annoying and malicious additions to Internet technology. Traditional spam filter software are unable to cope with vast volumesof spam that slip past anti-spam defenses. As spam problems escalate, effective and efficient tools are required to control them. Machine learning approaches have provided researchers with a better way to combat spam. Machine learning has been successfully applied in text classification. Since e-mail contains text, the ML approach can be seamlessly applied to classified spam. 5.EXECUTIVE SUMMARY  "E-mail is the single most important tool for business communication."  For many organizations, when their e-mail stops, their ability to conduct business stops too.  About 80% of the intellectual property of a typical company passes through its e-mail server.  There's a 72% chance of an e-mail failure in any company each year, lasting an average of 62 hours.  E-mail is set to grow by 68% in the next 5 years. And legal discovery is a growing consideration - making e-mail management more important.  The indirect costs of e-mail - mainly loss of productive time - are likely to be overlooked, even though they can be high.  E-mail costs are virtually impossible to generalize, because one is comparing apples with pears with oranges.  Stand-alone e-mail is inexpensive and simple. The disadvantages of free services (e.g. Gmail, hotmail) for businesses outweigh the benefitofthesmall cost savings.  Collaborative e-mail is typified by Microsoft Exchange, and includes communications features such as shared calendars, tasks and contacts; smartphone and Outlook integration; and central management and storage.  Collaborative e-mail is relatively expensive.  Exchange servers can be maintained in-house or housed in a data centre.  Shared hosted Exchange means renting space on a hosted server that includes many other accounts.  Cost comparisons for hosted vs in-house Exchange are often published by Exchange hosting providers, and tend to be distorted. Caveat emptor!  Shared hosted Exchange can be considerably more expensive over time than in-house, which also allows control and flexibility.  Hosted Exchange (shared or not) is an excellent option for businesses with branches in different geographic locations, and shared hosted Exchange is good for very small companies that can't justify the cost of an in-house server.  Google Apps is an inexpensive alternative to Exchange, but has several disadvantages and doesn't work as well. Figure 5:Anti-spam filtering system 6. SPAM-FILTERING METHODS Blacklist This popular spam-filtering method attempts to stop unwanted email by blocking messages from a preset list of senders that you or your organization’s system administrator create. Blacklists are records of email addresses or Internet Protocol (IP)addressesthathavebeen previously used to send spam. When an incoming message arrives, the spam filter checks to see if its IPoremail address
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 433 is on the blacklist; if so, the message is considered spam and rejected.Though blacklists ensure that known spammers cannot reach users' inboxes, they can also misidentify legitimate senders as spammers. These so-called false positives can result if a spammer happenstobesendingjunk mail from an IP address that is also used by legitimate email users. Also, since many clever spammers routinely switch IP addresses and email addresses to cover their tracks, a blacklist may not immediately catch the newest outbreaks. Real-Time Blackhole List This spam-filtering method works almost identically to a traditional blacklist but requireslesshands-onmaintenance. That’s because most real-time blackholelistsaremaintained by third parties, who take the time to build comprehensive blacklists on the behalf of their subscribers. Your filter simply has to connect to the third-party system eachtimean email comes in, to compare the sender’s IP address against the list.Since blackhole lists are large and frequently maintained, your organization's IT staff won't have tospend time manually adding new IP addressestothelist,increasing the chances that the filter will catch the newest junk-mail outbreaks. But like blacklists, real-time blackhole lists can also generate false positives if spammers happen to use a legitimate IP address as a conduit for junk mail. Also, since the list is likely to be maintained by a third party, you have less control over what addresses are on — or not on — the list. Whitelist A whitelist blocks spam using a system almost exactly opposite to that of a blacklist. Rather than lettingyouspecify which senders to block mail from, a whitelistletsyouspecify which senders to allow mail from; these addresses are placed on a trusted-users list. Most spam filters let you use a whitelist in addition to another spam-fighting feature as a way to cut down on the number of legitimate messages that accidentally get flagged as spam.However,usinga verystrict filter that only uses a whitelist would mean that anyonewho was not approved would automatically be blocked.Some anti-spam applications use a variation of this system known as an automatic whitelist. In this system, an unknown sender's email address is checked against a database; if they have no history of spamming, their message is sent to the recipient's inbox and they are added to the whitelist. Greylist A relatively new spam-filtering technique, greylists take advantage of the fact that many spammers only attempt to send a batch of junk mail once. Under thegreylistsystem, the receiving mail server initially rejects messages from unknown users and sends a failure message to the originating server. If the mail server attempts to send the message a second time — a step most legitimate servers will take — the greylist assumes the message is not spam and lets it proceed to the recipient's inbox. At this point, the greylist filter will add the recipient's email or IP address toa list of allowed senders.Though greylist filters require fewer system resources than some other types of spamfilters,they also may delay mail delivery, which could be inconvenient when you are expecting time-sensitive messages. Content-Based Filters Rather than enforcing across-the-board policies for all messages from a particular email or IP address, content- based filters evaluate words or phrases found in each individual message to determine whether an email is spam or legitimate. Word-Based Filters A word-based spam filter is the simplest type of content- based filter. Generally speaking, word-based filters simply block any email that contains certainterms.Sincemanyspam messages contain terms not often found in personal or business communications, word filters can be a simple yet capable technique for fighting junk email. However, if configured to block messages containing more common words, these types of filters may generatefalsepositives. For instance, if the filter has been set to stop all messages containing the word "discount," emails from legitimate senders offering your nonprofit hardware or software at a reduced price may not reach their destination.Alsonotethat since spammers often purposefully misspell keywords in order to evade word-based filters, your IT staff will need to make time to routinely update the filter's list of blocked words. Heuristic Filters Heuristic (or rule-based) filters take things a step beyond simple word-based filters. Rather than blocking messages that contain a suspicious word, heuristicfilterstakemultiple terms found in an email into consideration.Heuristic filters scan the contents of incoming emails and assigningpoints to words or phrases. Suspicious words that are commonly found in spam messages, such as "Rolex" or "Viagra,"receive higher points, while termsfrequentlyfoundinnormal emails receive lower scores. The filter then adds up all the points and calculates a total score. If the message receives a certain score or higher (determined by the anti-spam application's administrator), the filter identifies it as spam and blocks it. Messages that score lower than the target number are delivered to the user.Heuristic filters work fast — minimizing email delay — and are quite effective as soon as they have been installed and configured. However, heuristic filters configured to be aggressive may generate false positives if a legitimate contact happens to send an email containing a certain combination of words. Similarly, some savvy spammers might learn which words to avoid
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 434 including, thereby fooling the heuristic filter into believing they are benign senders. Bayesian Filters Bayesian filters, considered the most advanced form of content-based filtering, employ the laws of mathematical probability to determine which messages are legitimateand which are spam. In order for a Bayesian filter to effectively block spam, the end user must initially "train" itby manually flagging each message as eitherjunk orlegitimate. Overtime, the filter takes words and phrases found inlegitimate emails and adds them to a list; it does the same with terms found in spam.To determine which incoming messages are classified as spam, the Bayesian filter scans the contents of the email and then compares the text against its two-word lists to calculate the probability that the message is spam. For instance, if the word "valium"hasappeared62timesinspam messages list but only three times in legitimate emails,there is a 95 percent chance that an incoming email containing the word "valium" is junk.Because a Bayesian filter is constantly building its word list based on the messages that an individual user receives, it theoretically becomes more effective the longer it's used. However, since this method does require a training period before it starts working well, you will need to exercise patience and will probably have to manually delete a few junk messages, at least at first. ACKNOWLEDGEMENT I would like to give my sincere thanks to my thanks go to all the people who have supported me to completetheresearch work directly or indirectly. I am extremely grateful to my parents for their love, prayers, caring and sacrifices for educating and preparing me for my future. REFERENCES [1] B. Leiba and N. Borenstein. A Multi-Faceted Approach to Spam Prevention, Proceedings of the First Conference on E- mail and Anti-Spam, 2004. [2] M. Leonard, M. Rodriguez, R. Segal, and R. Shoop. Managing Customer Opt-Outs in a Complex Global Environment. Proceedings of the First Conference on E-mail and Anti-Spam, 2004. [3]P. Graham, A Plan for Spam. http://paulgraham.com/spam.html, August 2002. [4]P. Graham, Better Bayesian Filtering. http://paulgraham.com/better.html, January 2003. [5] P. Capek, B. Leiba, and M. N. Wegman. Charity Begins at your Mail Server, http://www.research.ibm.com/people/w/wegman/charit y.htm, 2004. [6] J. Lyon and M. Wong. MTA Authentication Records in DNS, Internet Draft, http://www.ietf.org/internetdrafts/draft-ietf-marid-core- 01.txt, June 2004 [7] S. Schleimer, D. Wilkerson, and A. Aiken, Winnowing: local algorithms for documentfingerprinting.InProceedings of SIGMOD 2003, San Diego CA, June 9-12, 2003. [8] T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4:5-31, 2001. [9] CLIFFORD, M., FAIGIN, D., BISHOP, M. & BRUTCH, T. Miracle Cures and Toner Cartridges: FindingSolutionsto the Spam Problem. 19th annual computer security applications conference (ACSAC 2003), 2003. [10] DEEPAK, P. & SANDEEP, P. Spam filtering using spam mail communities. In: SANDEEP, P., ed. Applications and the Internet, 2005. Proceedings. The 2005 Symposium on, 2005. [11] NAGAMALAI, D., DHINAKARAN, C. & LEE, J. K. Multi Layer Approach to Defend DDoS Attacks Caused by Spam. In: DHINAKARAN, C., ed. Multimedia and Ubiquitous Engineering, 2007. MUE '07. International Conference on, 2007. [12] Ferris Reseacrh (2009). Spam, Spammers, and Spam Control A White Paper by Ferris Research (March 2009). [13] Pang X, Feng Y, Jiang W (2007). A Chinese Anti-Spam Filter Approach Based on Support Vector Machine. Management Science and Engineering, 2007. ICMSE 2007. International Conference on, 20-22: 97-102, Aug. 2007. [14] Rish I (2001). An empirical study of the naïve bayes classifier [15] JUNG, J. & EMIL, S. An Empirical Study of Spam Traffic and the Use of DNS Black Lists. Internet Measurement Conference,, October 2004 Taormina, Italy. [16] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos and P. Stamatopoulos, Stacking Classifiers for Anti-Spam Filtering of E-Mail, In Proceedings of EMNLP-01, 6th Conference on Empirical Methods in Natural Language Processing, 2001. [17] P. Pantel and D. Lin. SpamCop -- A Spam Classification& Organization Program.InProceedingsofAAAI-98Workshop on Learning for Text Categorization, 1998. [18] I. Rigoutsos and T. Huynh. Chung-Kwei: a pattern- discovery-based system for the automatic identification of unsolicited e-mail messages (SPAM).ProceedingsoftheFirst Conference on E-mail and Anti-Spam, 2004.