SlideShare a Scribd company logo
The International Journal of Engineering And Science (IJES)
||Volume|| 2 ||Issue|| 1 ||Pages|| 62-64 ||2013||
ISSN: 2319 – 1813 ISBN: 2319 – 1805
      Proposed Phishing mail detection using fuzzy classification
                              methods
                                      1
                                          Ami k. Trivedi, 2G.J.Sahani
          1
              (Department of Computer Engineering, Sardar Patel institute of technology, vasad, Gujarat
                 2
                   (Department of Computer Engineering, Sardar Patel institute of technology, vasad,

------------------------------------------------------Abstract------------------------------------------------
Phishing is an attack that deals with social engineering methodology to illegally acquire and use someone
else’s data on behalf of legitimate website for own benefit If. Even a few victims fall for the pitch, the effort is
profitable. Phishing mail misrepresent true sender and steal personal and financial account credentials. Most of
the detection techniques use decision tree, machine learning algorithms, genetic algorithm, clustering
techniques. In these techniques crisp logic is used. They classify email as spam and Not-spam email. Crisp logic
is often failed because it does not provide sharp boundaries. Several Artificial Intelligence (AI) techniques
including neural networks and fuzzy logic [7-9] are successfully applied to a wide variety of decision making
problems in real world. Up to our knowledge, there was not developed any system to the phishing mail detection
based on fuzzy classification method. In this work we would like to build a fuzzy rule generation system to detect
phishing mail. It classifies email into different category like very legitimate, legitimate, suspicious, phishy, very
phishy etc. The main advantage of fuzzy rule-based classification systems is that they do not require large
memory storage, their inference speed is very high and the users can carefully examine each fuzzy if-then rule.
-----------------------------------------------------------------------------------------------------------------
Date of Submission: 17, December, 2012                               Date of Publication: 05, January 2013
---------------------------------------------------------------------------------------------------------------- -

I.    Introduction

Mail spammers can be categorized based on                     II.     Literature review and related work
                                                                         Research on header analysis was recently
their intent. Some spammers are telemarketers, who
                                                               reported by Microsoft, IBM, and Cornell
broadcast      unsolicited   emails    to    several
                                                               University in the 2004 and 2005 anti-spam
hundred/thousands of email users. They do not
                                                               conferences [4] [5]. Barry Lieba‟s [4] filtering
have a specific target, but blindly send the
                                                               technique makes use of the path travelled by an
broadcast and expect a very limited rate of return.
                                                               email, which is extracted from the email header.
The next category of spammers comprise the opt-in
                                                               They have analysed the end units in the path but
spammers, who keep sending unsolicited messages
                                                               not the end-to-end path. They claim their approach
though you have little or no interest in them. In
                                                               complements the existing filters but does not work
some cases, they spam you with unrelated topics or
                                                               as a standalone mechanism. Christine E. Drake [5]
marketing material. Some of the examples are
                                                               in his paper, “Anatomy of Phishing Email”,
conference notices, professional news or meeting
                                                               discusses the tricks employed by the email
announcements. The third category of spammers is
                                                               scammers in phishing emails. In paper “Anti-
called phishers. Phishing attackers misrepresent the
                                                               phishing techniques: A Review” [1] authors
true sender and steal the consumers' personal
                                                               proposed content filtering methodology in which
identity data and financial account credentials.
                                                               content of email is filtered out using. Machine
These spammers send spoofed emails and lead
                                                               learning methods like Bayesian Additive
consumers to counterfeit websites designed to trick
                                                               Regression Trees (BART), support vector machine
recipients into divulging financial data such as
                                                               (SVM).They also used genetic algorithm to detect
credit card numbers, account usernames, passwords
                                                               phishing email but problem with this algorithm is
and social security numbers. By hijacking brand
                                                               that it is only work for one rule, if there are more
names of banks, e-retailers and credit card
                                                               than one rule algorithm becomes more complex.
companies, phishers often convince the recipients
                                                               Character based anti-phishing approach check for
to respond.
                                                               the URL which mostly used by the attackers, but
                                                               this approach result in false positive. In paper
                                                               “Detecting Phishing in email” [2] author suggest
                                                               classification on three kinds of analyses on the
                                                               header: DNS-based header analysis, Social

www.theijes.com                                        The IJES                                            Page 62
Proposed Phishing mail detection using fuzzy classification methods
Network analysis and Wantedness analysis. In the                         A fuzzy set A in X is characterized by a
DNS-based header analysis, they classified the                 membership function which is easily implemented
corpus into 8 buckets and used social network                  by fuzzy conditional statements. For example, if
analysis to further reduce the false positives. They           the antecedent is true to some degree of
introduced a concept of wantedness and credibility,            membership, then the consequent is also true to that
and derived equations to calculate the wantedness              same degree. If antecedent Then consequent.
values of the email senders. Finally, credibility and                    In a fuzzy classification system, a case or
all the three analyses to classify the phishing                an object can be classified by applying a set of
emails. Method resulted in far less false positives.           fuzzy rules based on the linguistic values of its
They plan to perform two more analyses on the                  attributes. Every rule has a weight, which is a
incoming email traffic, i) End to End path Analysis,           number between 0 and 1, and this is applied to the
by which they try to establish the legitimacy of the           number given by the antecedent. It involves 2
path taken by an email; ii) Relay Analysis, by                 distinct parts. The first part involves evaluating the
which they verify the trustworthiness and                      antecedent, fuzzifying the input and applying any
reputation of the relays participating in the relaying         necessary fuzzy operators. The second part requires
of emails. They combine all the methods i) Path                application of that result to the consequent, known
analysis, ii) Relay analysis, iii) DNS-based header            as inference. To build a fuzzy classification system,
analysis and iv) Social Network analysis for                   the most difficult task is to find a set of fuzzy rules
developing a email classifier, which classifies the            pertaining to the specific classification problem. A
incoming email traffic as i) Phishing emails ii)               fuzzy inference system is a rule-based system that
Socially Wanted emails (Legitimate emails) iii)                uses fuzzy logic, rather than Boolean logic, to
Socially Unwanted emails (Spam emails).They are                reason about data. Its basic structure includes four
only concentrated on email header not in the                   main components (1) a fuzzifier, which translates
content of the email. Paper “learning to detect                crisp (real-valued) inputs into fuzzy values; (2) an
phishing in email”[3] proposed machine learning                inference engine that applies a fuzzy reasoning
approach with 10-fold cross validation training data           mechanism to obtain a fuzzy output; (3) a
which use random forest as classifier which only               defuzzifier, which translates this latter output into a
categorized spam or no-spam email. They are not                crisp value; and (4) a knowledge base, which
going to classify mail as legitimate, spam or                  contains both an ensemble of fuzzy rules, known as
phishing.                                                      the rule base, and an ensemble of membership
                                                               functions known as the database. The decision-
III.    Fuzzy Systems                                          making process is performed by the inference
          Fuzzy logic was invented by Zadeh [6] in             engine using the rules contained in the rule base.
1965 for handling uncertain and imprecise                      These fuzzy rules define the connection between
knowledge in real world applications. It has proved            input and output fuzzy variables.
to be a powerful tool for decision-making, and to
handle and manipulate imprecise and noisy data.                IV.     Proposed fuzzy classification based
The first major commercial application was in the                      approach
area of cement kiln control. This requires that an                       To detect phishing mail, we are going to
operator monitor four internal states of the kiln,             proposed four direct rule generation methods based
control four sets of operations, and dynamically               on fuzzy classification. The first method generates
manage 40 or 50 "rules of thumb" about their                   fuzzy if-then rules using the mean and the standard
interrelationships, all with the goal of controlling a         deviation of attribute values. The second approach
highly complex set of chemical interactions. One               generates fuzzy if-then rules using the histogram of
such rule is "If the oxygen percentage is rather high          attributes values. The third procedure generates
and the free-lime and kiln-drive torque rate is                fuzzy if-then rules with certainty of each attribute
normal, decrease the flow of gas and slightly                  into homogeneous fuzzy sets. In the fourth
reduce the fuel rate". The notion central to fuzzy             approach, only overlapping areas are partitioned.
systems is that truth values (in fuzzy logic) or               The first two approaches generate a single fuzzy if-
membership values (in fuzzy sets) are indicated by             then rule for each class by specifying the
a value on the range [0.0, 1.0], with 0.0                      membership function of each antecedent fuzzy set
representing absolute         Falseness    and     1.0         using the information about attribute values of
representing absolute Truth. A fuzzy system is                 training patterns. The other two approaches are
characterized by a set of linguistic statements based          based on fuzzy grids with homogeneous fuzzy
on expert knowledge. The expert knowledge is                   partitions of each attribute. To start with these
usually in the form of “if-then” rules.                        methods linguistic descriptors such as high, low,
                                                               medium are assigned to a range of values for each
                                                               key phishing characteristic indicator. Valid ranges
                                                               of the inputs are considered and divided into
                                                               classes, or fuzzy sets. For example, length of URL


www.theijes.com                                          The IJES                                           Page 63
Proposed Phishing mail detection using fuzzy classification methods
address can range from „low‟ to „high‟ with other     References
values in between. We cannot specify clear                       [1]    Gaurav, Madhuresh Mishra, Anurag Jain, “Anti-
boundaries between classes. The degree of                               Phishing Techniques: A Review”, International
belongingness of the values of the variables to any                     Journal      of    Engineering    Research     and
selected class is called the degree of membership;                      Applications, vol.2, Issue: 2, Mar-Apr 2012,
membership function is designed for each phishing                       pp350-355
                                                                 [2]    Srikanth Palla and Ram Dantu, “Detecting
characteristic indicator, which is a curve that                         Phishing in Emails”
defines how each point in the input space is                     [3]    Ian fatte, Norman sadeh, Anthony Tomasic,
mapped to a membership value between [0, 1].                            “Learning to detect Phishing in email”, Technical
Linguistic values are assigned for each phishing                        Report CMU-ISRI-06-112, June-2006.
indicator as low, moderate, and high while for                   [4]    Barry Leiba, Joel Ossher, V.T. Rajan, Richard
spam mail as Very legitimate, Legitimate,                               Segal, Mark Wegman, “SMTP Path Analysis”
Suspicious, Phishy, and Very phishy (triangular                         First Conference on Email and Anti-Spam (CEAS)
and trapezoidal membership function). For each                          2004 Proceedings.
input, their values range from 0 to 10 while for                 [5]    Christine E. Drake, Jonathan J. Oliver, and Eugene
                                                                        J. Koontz,”Anatomy of a Phishing Email”,
output, range from 0 to 100.Flow of our proposed                        Proceedings of First Conference on Email and
system is as follows:                                                   Anti-Spam (CEAS), July 2004.
                                                                 [6]    Zadeh, L.A., Fuzzy Logic IEEE Computer, pp. 83-
                                                                        93 (1988).
                                                                 [7]    Krzysztof Trawi´nski “A Fuzzy Classification
                                                                        System for Prediction of the Results of the
                                                                        Basketball Games”, IEEE computer, 2010.
                                                                 [8]   Andreas Meier and Nicolas Werro,”A Fuzzy
                                                                        Classification Model for Online Customers”,
                                                                        Informatica 31, pp. 175-182.2007.
                                                                 [9]    Ravi. Jain, Ajith. Abraham,”A Comparative Study
                                                                        of Fuzzy Classification Methods on Breast Cancer
                                                                        Data” 7th International Work Conference on
                                                                        Artificial and Natural Neural Networks,
                                                                        IWANN’03, Spain, 2003.




 Figure 1 : Flow of Proposed Fuzzy Classification System


V.    Conclusion and Future Work
          A motivation behind using fuzzy rule is
soft decision boundaries provide sharp transition
between classes and could become a crisp one if it
is needed. Machine learning techniques can be
applied to spam mail detection. Many systems are
built on fuzzy classification method like predicting
result of basketball game using fuzzy classification
method [7], fuzzy classification methods for breast
cancer data [9],online customer[8].By analysing
these papers we conclude that fuzzy classification
provide good result in decision making problems.
In future we are going to collect sample data for
email and then applying different fuzzy
classification methods on it and classified email as
spam or Anti-spam.




www.theijes.com                                            The IJES                                             Page 64

More Related Content

What's hot

Honeywords for Password Security and Management
Honeywords for Password Security and ManagementHoneywords for Password Security and Management
Honeywords for Password Security and Management
IRJET Journal
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filtering
ijtsrd
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysis
ijnlc
 
IRJET- Identification of Clone Attacks in Social Networking Sites
IRJET-  	  Identification of Clone Attacks in Social Networking SitesIRJET-  	  Identification of Clone Attacks in Social Networking Sites
IRJET- Identification of Clone Attacks in Social Networking Sites
IRJET Journal
 
Email spam detection
Email spam detectionEmail spam detection
Email spam detection
PratisthaSingh5
 
Clustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security ApplicationsClustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security Applications
IJSTA
 
Web Spam Detection Using Machine Learning
Web Spam Detection Using Machine LearningWeb Spam Detection Using Machine Learning
Web Spam Detection Using Machine Learning
butest
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
International Journal of Technical Research & Application
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET Journal
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
Partnered Health
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
Aboul Ella Hassanien
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
AM Publications
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
Partnered Health
 
M dgx mde0mde=
M dgx mde0mde=M dgx mde0mde=
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET Journal
 
Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
Mạnh Nguyễn Văn
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
IRJET Journal
 
How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?
George Sam
 

What's hot (18)

Honeywords for Password Security and Management
Honeywords for Password Security and ManagementHoneywords for Password Security and Management
Honeywords for Password Security and Management
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filtering
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysis
 
IRJET- Identification of Clone Attacks in Social Networking Sites
IRJET-  	  Identification of Clone Attacks in Social Networking SitesIRJET-  	  Identification of Clone Attacks in Social Networking Sites
IRJET- Identification of Clone Attacks in Social Networking Sites
 
Email spam detection
Email spam detectionEmail spam detection
Email spam detection
 
Clustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security ApplicationsClustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security Applications
 
Web Spam Detection Using Machine Learning
Web Spam Detection Using Machine LearningWeb Spam Detection Using Machine Learning
Web Spam Detection Using Machine Learning
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
M dgx mde0mde=
M dgx mde0mde=M dgx mde0mde=
M dgx mde0mde=
 
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
 
Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?
 

Viewers also liked

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Actitud ante la vida lopez
Actitud ante la vida lopezActitud ante la vida lopez
Actitud ante la vida lopez
José Luis Velarde Tiznado
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
시청야자3월5일(화)수양록
시청야자3월5일(화)수양록시청야자3월5일(화)수양록
시청야자3월5일(화)수양록hotelyaja
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Privacy preserving and truthful detection
Privacy preserving and truthful detectionPrivacy preserving and truthful detection
Privacy preserving and truthful detection
jpstudcorner
 
Produksi multimedia
Produksi multimediaProduksi multimedia
Produksi multimedia
Muhammad Atan
 

Viewers also liked (7)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Actitud ante la vida lopez
Actitud ante la vida lopezActitud ante la vida lopez
Actitud ante la vida lopez
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
시청야자3월5일(화)수양록
시청야자3월5일(화)수양록시청야자3월5일(화)수양록
시청야자3월5일(화)수양록
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Privacy preserving and truthful detection
Privacy preserving and truthful detectionPrivacy preserving and truthful detection
Privacy preserving and truthful detection
 
Produksi multimedia
Produksi multimediaProduksi multimedia
Produksi multimedia
 

Similar to The International Journal of Engineering and Science (The IJES)

The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
IRJET Journal
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
Anush90
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
IRJET Journal
 
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYPHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
IJNSA Journal
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
IRJET Journal
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
Heather Strinden
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
IRJET Journal
 
Improving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association MiningImproving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association Mining
theijes
 
Literature Review.docx
Literature Review.docxLiterature Review.docx
Literature Review.docx
AliAgral2
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
ijcseit
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection system
csandit
 
Phishing: Analysis and Countermeasures
Phishing: Analysis and CountermeasuresPhishing: Analysis and Countermeasures
Phishing: Analysis and Countermeasures
IRJET Journal
 
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
IDES Editor
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
IJNSA Journal
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
Editor IJCATR
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
Dhara Shah
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
eSAT Publishing House
 
IJSRED-V2I4P0
IJSRED-V2I4P0IJSRED-V2I4P0
IJSRED-V2I4P0
IJSRED
 
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdfDay3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
ssuserb29f84
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
IJCNCJournal
 

Similar to The International Journal of Engineering and Science (The IJES) (20)

The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
 
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYPHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEY
 
Phishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification AlgorithmsPhishing Website Detection using Classification Algorithms
Phishing Website Detection using Classification Algorithms
 
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGA LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
Improving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association MiningImproving Phishing URL Detection Using Fuzzy Association Mining
Improving Phishing URL Detection Using Fuzzy Association Mining
 
Literature Review.docx
Literature Review.docxLiterature Review.docx
Literature Review.docx
 
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKMALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection system
 
Phishing: Analysis and Countermeasures
Phishing: Analysis and CountermeasuresPhishing: Analysis and Countermeasures
Phishing: Analysis and Countermeasures
 
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
IJSRED-V2I4P0
IJSRED-V2I4P0IJSRED-V2I4P0
IJSRED-V2I4P0
 
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdfDay3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
Day3-0930-Green-Intent-based-approach-to-detect-email-account-compromise.pdf
 
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...
 

The International Journal of Engineering and Science (The IJES)

  • 1. The International Journal of Engineering And Science (IJES) ||Volume|| 2 ||Issue|| 1 ||Pages|| 62-64 ||2013|| ISSN: 2319 – 1813 ISBN: 2319 – 1805 Proposed Phishing mail detection using fuzzy classification methods 1 Ami k. Trivedi, 2G.J.Sahani 1 (Department of Computer Engineering, Sardar Patel institute of technology, vasad, Gujarat 2 (Department of Computer Engineering, Sardar Patel institute of technology, vasad, ------------------------------------------------------Abstract------------------------------------------------ Phishing is an attack that deals with social engineering methodology to illegally acquire and use someone else’s data on behalf of legitimate website for own benefit If. Even a few victims fall for the pitch, the effort is profitable. Phishing mail misrepresent true sender and steal personal and financial account credentials. Most of the detection techniques use decision tree, machine learning algorithms, genetic algorithm, clustering techniques. In these techniques crisp logic is used. They classify email as spam and Not-spam email. Crisp logic is often failed because it does not provide sharp boundaries. Several Artificial Intelligence (AI) techniques including neural networks and fuzzy logic [7-9] are successfully applied to a wide variety of decision making problems in real world. Up to our knowledge, there was not developed any system to the phishing mail detection based on fuzzy classification method. In this work we would like to build a fuzzy rule generation system to detect phishing mail. It classifies email into different category like very legitimate, legitimate, suspicious, phishy, very phishy etc. The main advantage of fuzzy rule-based classification systems is that they do not require large memory storage, their inference speed is very high and the users can carefully examine each fuzzy if-then rule. ----------------------------------------------------------------------------------------------------------------- Date of Submission: 17, December, 2012 Date of Publication: 05, January 2013 ---------------------------------------------------------------------------------------------------------------- - I. Introduction Mail spammers can be categorized based on II. Literature review and related work Research on header analysis was recently their intent. Some spammers are telemarketers, who reported by Microsoft, IBM, and Cornell broadcast unsolicited emails to several University in the 2004 and 2005 anti-spam hundred/thousands of email users. They do not conferences [4] [5]. Barry Lieba‟s [4] filtering have a specific target, but blindly send the technique makes use of the path travelled by an broadcast and expect a very limited rate of return. email, which is extracted from the email header. The next category of spammers comprise the opt-in They have analysed the end units in the path but spammers, who keep sending unsolicited messages not the end-to-end path. They claim their approach though you have little or no interest in them. In complements the existing filters but does not work some cases, they spam you with unrelated topics or as a standalone mechanism. Christine E. Drake [5] marketing material. Some of the examples are in his paper, “Anatomy of Phishing Email”, conference notices, professional news or meeting discusses the tricks employed by the email announcements. The third category of spammers is scammers in phishing emails. In paper “Anti- called phishers. Phishing attackers misrepresent the phishing techniques: A Review” [1] authors true sender and steal the consumers' personal proposed content filtering methodology in which identity data and financial account credentials. content of email is filtered out using. Machine These spammers send spoofed emails and lead learning methods like Bayesian Additive consumers to counterfeit websites designed to trick Regression Trees (BART), support vector machine recipients into divulging financial data such as (SVM).They also used genetic algorithm to detect credit card numbers, account usernames, passwords phishing email but problem with this algorithm is and social security numbers. By hijacking brand that it is only work for one rule, if there are more names of banks, e-retailers and credit card than one rule algorithm becomes more complex. companies, phishers often convince the recipients Character based anti-phishing approach check for to respond. the URL which mostly used by the attackers, but this approach result in false positive. In paper “Detecting Phishing in email” [2] author suggest classification on three kinds of analyses on the header: DNS-based header analysis, Social www.theijes.com The IJES Page 62
  • 2. Proposed Phishing mail detection using fuzzy classification methods Network analysis and Wantedness analysis. In the A fuzzy set A in X is characterized by a DNS-based header analysis, they classified the membership function which is easily implemented corpus into 8 buckets and used social network by fuzzy conditional statements. For example, if analysis to further reduce the false positives. They the antecedent is true to some degree of introduced a concept of wantedness and credibility, membership, then the consequent is also true to that and derived equations to calculate the wantedness same degree. If antecedent Then consequent. values of the email senders. Finally, credibility and In a fuzzy classification system, a case or all the three analyses to classify the phishing an object can be classified by applying a set of emails. Method resulted in far less false positives. fuzzy rules based on the linguistic values of its They plan to perform two more analyses on the attributes. Every rule has a weight, which is a incoming email traffic, i) End to End path Analysis, number between 0 and 1, and this is applied to the by which they try to establish the legitimacy of the number given by the antecedent. It involves 2 path taken by an email; ii) Relay Analysis, by distinct parts. The first part involves evaluating the which they verify the trustworthiness and antecedent, fuzzifying the input and applying any reputation of the relays participating in the relaying necessary fuzzy operators. The second part requires of emails. They combine all the methods i) Path application of that result to the consequent, known analysis, ii) Relay analysis, iii) DNS-based header as inference. To build a fuzzy classification system, analysis and iv) Social Network analysis for the most difficult task is to find a set of fuzzy rules developing a email classifier, which classifies the pertaining to the specific classification problem. A incoming email traffic as i) Phishing emails ii) fuzzy inference system is a rule-based system that Socially Wanted emails (Legitimate emails) iii) uses fuzzy logic, rather than Boolean logic, to Socially Unwanted emails (Spam emails).They are reason about data. Its basic structure includes four only concentrated on email header not in the main components (1) a fuzzifier, which translates content of the email. Paper “learning to detect crisp (real-valued) inputs into fuzzy values; (2) an phishing in email”[3] proposed machine learning inference engine that applies a fuzzy reasoning approach with 10-fold cross validation training data mechanism to obtain a fuzzy output; (3) a which use random forest as classifier which only defuzzifier, which translates this latter output into a categorized spam or no-spam email. They are not crisp value; and (4) a knowledge base, which going to classify mail as legitimate, spam or contains both an ensemble of fuzzy rules, known as phishing. the rule base, and an ensemble of membership functions known as the database. The decision- III. Fuzzy Systems making process is performed by the inference Fuzzy logic was invented by Zadeh [6] in engine using the rules contained in the rule base. 1965 for handling uncertain and imprecise These fuzzy rules define the connection between knowledge in real world applications. It has proved input and output fuzzy variables. to be a powerful tool for decision-making, and to handle and manipulate imprecise and noisy data. IV. Proposed fuzzy classification based The first major commercial application was in the approach area of cement kiln control. This requires that an To detect phishing mail, we are going to operator monitor four internal states of the kiln, proposed four direct rule generation methods based control four sets of operations, and dynamically on fuzzy classification. The first method generates manage 40 or 50 "rules of thumb" about their fuzzy if-then rules using the mean and the standard interrelationships, all with the goal of controlling a deviation of attribute values. The second approach highly complex set of chemical interactions. One generates fuzzy if-then rules using the histogram of such rule is "If the oxygen percentage is rather high attributes values. The third procedure generates and the free-lime and kiln-drive torque rate is fuzzy if-then rules with certainty of each attribute normal, decrease the flow of gas and slightly into homogeneous fuzzy sets. In the fourth reduce the fuel rate". The notion central to fuzzy approach, only overlapping areas are partitioned. systems is that truth values (in fuzzy logic) or The first two approaches generate a single fuzzy if- membership values (in fuzzy sets) are indicated by then rule for each class by specifying the a value on the range [0.0, 1.0], with 0.0 membership function of each antecedent fuzzy set representing absolute Falseness and 1.0 using the information about attribute values of representing absolute Truth. A fuzzy system is training patterns. The other two approaches are characterized by a set of linguistic statements based based on fuzzy grids with homogeneous fuzzy on expert knowledge. The expert knowledge is partitions of each attribute. To start with these usually in the form of “if-then” rules. methods linguistic descriptors such as high, low, medium are assigned to a range of values for each key phishing characteristic indicator. Valid ranges of the inputs are considered and divided into classes, or fuzzy sets. For example, length of URL www.theijes.com The IJES Page 63
  • 3. Proposed Phishing mail detection using fuzzy classification methods address can range from „low‟ to „high‟ with other References values in between. We cannot specify clear [1] Gaurav, Madhuresh Mishra, Anurag Jain, “Anti- boundaries between classes. The degree of Phishing Techniques: A Review”, International belongingness of the values of the variables to any Journal of Engineering Research and selected class is called the degree of membership; Applications, vol.2, Issue: 2, Mar-Apr 2012, membership function is designed for each phishing pp350-355 [2] Srikanth Palla and Ram Dantu, “Detecting characteristic indicator, which is a curve that Phishing in Emails” defines how each point in the input space is [3] Ian fatte, Norman sadeh, Anthony Tomasic, mapped to a membership value between [0, 1]. “Learning to detect Phishing in email”, Technical Linguistic values are assigned for each phishing Report CMU-ISRI-06-112, June-2006. indicator as low, moderate, and high while for [4] Barry Leiba, Joel Ossher, V.T. Rajan, Richard spam mail as Very legitimate, Legitimate, Segal, Mark Wegman, “SMTP Path Analysis” Suspicious, Phishy, and Very phishy (triangular First Conference on Email and Anti-Spam (CEAS) and trapezoidal membership function). For each 2004 Proceedings. input, their values range from 0 to 10 while for [5] Christine E. Drake, Jonathan J. Oliver, and Eugene J. Koontz,”Anatomy of a Phishing Email”, output, range from 0 to 100.Flow of our proposed Proceedings of First Conference on Email and system is as follows: Anti-Spam (CEAS), July 2004. [6] Zadeh, L.A., Fuzzy Logic IEEE Computer, pp. 83- 93 (1988). [7] Krzysztof Trawi´nski “A Fuzzy Classification System for Prediction of the Results of the Basketball Games”, IEEE computer, 2010. [8] Andreas Meier and Nicolas Werro,”A Fuzzy Classification Model for Online Customers”, Informatica 31, pp. 175-182.2007. [9] Ravi. Jain, Ajith. Abraham,”A Comparative Study of Fuzzy Classification Methods on Breast Cancer Data” 7th International Work Conference on Artificial and Natural Neural Networks, IWANN’03, Spain, 2003. Figure 1 : Flow of Proposed Fuzzy Classification System V. Conclusion and Future Work A motivation behind using fuzzy rule is soft decision boundaries provide sharp transition between classes and could become a crisp one if it is needed. Machine learning techniques can be applied to spam mail detection. Many systems are built on fuzzy classification method like predicting result of basketball game using fuzzy classification method [7], fuzzy classification methods for breast cancer data [9],online customer[8].By analysing these papers we conclude that fuzzy classification provide good result in decision making problems. In future we are going to collect sample data for email and then applying different fuzzy classification methods on it and classified email as spam or Anti-spam. www.theijes.com The IJES Page 64