SlideShare a Scribd company logo
1 of 15
Proposed efficient algorithm to
filter spam using machine
learning techniques
Rozeena Saleha
MS in Information Security
rozeenasaleha15@gmail.com
Abstract
 Electronic spam is the most troublesome
Internet phenomenon challenging large global
companies, including AOL, Google,Yahoo and
Microsoft. Spam causes various problems that
may, in turn, cause economic losses. Spam
causes traffic problems and bottlenecks that
limit memory space, computing power and
speed. Spam causes users to spend time
removing it.
Problem it
address
Spam causes traffic problems and bottlenecks
that
 limit memory space,
 Computing power
 speed
 Spam causes users to spend time removing it.
Architecture
of spam
filtering rules
and existing
methods
 This study describes three machine-learning
algorithms to filter spam from valid emails with
low error rates and high efficiency using a
multilayer perceptron model.
 Black list/white list,
 Bayesian classifying algorithm,
 Keyword matching and header information
analysis
Methodology
 Decision-based systems
 Bayesian classifiers
 Support vector machine
 Neural networks
 Sample-based methods
 Decision tree classifier, multilayer perceptron
and naïve Bayes classifier provided in the
proposed model.
Multilayer
perceptron
(MLP)
The model delivers information by activating input neurons containing values
labelled on them. Activation of neurons is calculated in the middle or output
layer, as follows:
AC4.5 tree is
modelledasfollows.Atraining
setisasetofbasetuplesto
determineclassesrelatedto
thesetuples.AtupleXis
representedbyanadjective
vectorX¼(x1,x2,…,xn).
Assumethatatuplebelongstoa
predefinedclassthatis
determined byanadjective
calledasclasslabel.Thetraining
setisrandomly
selectedfromthebase;thisstep
iscalledthelearningstep.This
techniqueisveryefficientand
extensivelyusesclassification.
DecisionTree
Classifier
 A node of the tree represents a test on an adjective;
 A branch exiting from a node represents possible
outputs of a test.
 A leaf represents a class label.
 A decision tree includes a rule set by which objective
functions can be predicted.The J48 algorithm is an
optimized version of C4.5.
 The algorithm used for this model uses greedy
techniques shows classification of a sample based on a
decision tree.
Naïve
Bayesian
classifier
Probability
Distribution in
compressed
form.
An important problem with Bayes theory is
independence of
random variables, the lack of which makes it
difficult to calculate probability formula.
Extraction of
features and
implementatio
n of the
considered
algorithm
The work here was based on rules for proper scoring in terms of
the efficiency of rules.The considered rules were provided in three
forms:
1) email header information analysis,
2) keyword matching,
3) main body of the message. A score was finally obtained for
these rules.
Table
 Table 1
 Series of rules to assign a score to the received email.
 Via meaning of the name
 Via domain names
 Via blocked IP number
 By detecting apostrophe
 Via addresses of in the black list
 Via addresses of in the white list
 Via type of content
 Via bounded content
 Via content of name
 Via undeclared addresses of recipients
 Via main header
 Receiving from an address and sending to a similar address
Accuracy
 The primary dataset included 750 valid emails as well as 750
spams.To extract vector features of an email, the following
methods were used: 1) email header review,
 2) keyword review,
 3)black list and white list.
 Labels of a class in the proposed model included L and S; the
former represented Legitimate, and the latter is an alternative for
Spam.
 Three classifiers including naïve Bayes, C4.5 decision tree and
MLP neural networks were run on training data byWEKA
software.
 The training data were tested in terms of message header
information,black and white list and keyword review. Efficiency
and accuracy of training data were evaluated by 10 classes of
reliably valid data. An accuracy factor was calculated as follows.
Conclusion
Two other basic concepts are used for spam filtering operations,
false positive and false negative.The former refers to those spams
classified as valid emails and the latter refers to valid emails wrongly
classified as spam. False positive rate of a classifier is applied to its
efficiency.Table 1 shows the efficiency of the above classifiers.
Efficiency of these three techniques depends on the following
factors:
1) valid recognition rate,
2) data training time and
3) False positive rate.
The efficiency of MLP neural network was better than the other models.
MLP requires more time to develop the model. J48 and naïve
Bayes algorithm require more time on learning experimental data.
References
[1]
I. Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. SpyrpolousAn evaluation of naïve
Bayesian anti-spam filtering
Proceeding of 11th Euro Conference on Match Learn (2000)
[2]
I. Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. SpyrpolousAn experimental comparison of
naïve Bayesian and keyword-based anti-spam filtering with personal e-mail messages
Proceeding of the an International ACM SIGIR Conference on Res and Devel in Inform Retrieval (2000)
[3]
A. Asuncion, D. NewmanUCI Machine Learning Repository
(2007)
http://www.ics.uci.edu/mlearn/MLRRepository.html
[4]
E. Blanzieri, A. BrylA Survey of Learning-based Techniques of Email Spam Filtering Tech. Rep
DIT-06-056
University of Trento, Information Engineering and Computer Science Department (2008)
[5]
J. Clark, I. Koprinska, J. PoonA neural network-based approach to automated email classification
Proceeding of the IEEE/WIC International Conference on Web Intelligence (2003)
[6]
S.J. Delany, D. BridgeTextual case-based reasoning for spam filtering: a comparison of feature-based and
feature-free approaches
Artif. Intell. Rev., 26 (1–2) (2006), pp. 75-87
CrossRefView Record in Scopus
[7]
F. Fdez Riverola, E. Iglesias, F. Diaz, J.R. Mendez, J.M. ChorchodoSpam hunting: an instance-based reasoning
system for spam labeling anf filtering
Decis. Support Syst., 43 (3) (2007), pp. 722-736
ArticlePDF (492KB)View Record in Scopus
[8]
A. SeewaldAn evaluation of Naïve Bayes variants in content-based learning for spam filtering
Intell. Data Anal. (2009)
[9]
Ahmed KhorsiAn overview of content-based spam filtering techniques
Informatica, 31 (3) (October 2007), pp. 269-277
View Record in Scopus

More Related Content

What's hot

Cross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesCross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesIJSRED
 
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...tmra
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisijnlc
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boostImproved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boosteSAT Journals
 
Promise 2011: "Customization Support for CBR-Based Defect Prediction"
Promise 2011: "Customization Support for CBR-Based Defect Prediction"Promise 2011: "Customization Support for CBR-Based Defect Prediction"
Promise 2011: "Customization Support for CBR-Based Defect Prediction"CS, NcState
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...IJNSA Journal
 
Geom_ExecSummSlides
Geom_ExecSummSlidesGeom_ExecSummSlides
Geom_ExecSummSlidesEric Peshkin
 
Spam Email: 8 Dos and Dont's
Spam Email: 8 Dos and Dont'sSpam Email: 8 Dos and Dont's
Spam Email: 8 Dos and Dont'sSaneBox
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsCITE
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 

What's hot (20)

Cross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesCross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning Techniques
 
Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
 
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysis
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boostImproved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
 
B0940509
B0940509B0940509
B0940509
 
Promise 2011: "Customization Support for CBR-Based Defect Prediction"
Promise 2011: "Customization Support for CBR-Based Defect Prediction"Promise 2011: "Customization Support for CBR-Based Defect Prediction"
Promise 2011: "Customization Support for CBR-Based Defect Prediction"
 
Jt3616901697
Jt3616901697Jt3616901697
Jt3616901697
 
P33077080
P33077080P33077080
P33077080
 
Query expansion
Query expansionQuery expansion
Query expansion
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
 
Geom_ExecSummSlides
Geom_ExecSummSlidesGeom_ExecSummSlides
Geom_ExecSummSlides
 
Spam Email: 8 Dos and Dont's
Spam Email: 8 Dos and Dont'sSpam Email: 8 Dos and Dont's
Spam Email: 8 Dos and Dont's
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 

Similar to Machine learning

OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsIRJET Journal
 
A Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data ClusteringA Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data Clusteringidescitation
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET Journal
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningIRJET Journal
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classificationAlexander Decker
 
Hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationHybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationAlexander Decker
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
 
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docx
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docxA Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docx
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docxShakas Technologies
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
 

Similar to Machine learning (20)

OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
 
M dgx mde0mde=
M dgx mde0mde=M dgx mde0mde=
M dgx mde0mde=
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
 
A Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data ClusteringA Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data Clustering
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection System
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification
 
Hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationHybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classification
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docx
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docxA Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docx
A Multi-Stage Machine Learning and Fuzzy Approach to Cyber-Hate Detection.docx
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
 

Recently uploaded

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Machine learning

  • 1.
  • 2. Proposed efficient algorithm to filter spam using machine learning techniques Rozeena Saleha MS in Information Security rozeenasaleha15@gmail.com
  • 3. Abstract  Electronic spam is the most troublesome Internet phenomenon challenging large global companies, including AOL, Google,Yahoo and Microsoft. Spam causes various problems that may, in turn, cause economic losses. Spam causes traffic problems and bottlenecks that limit memory space, computing power and speed. Spam causes users to spend time removing it.
  • 4. Problem it address Spam causes traffic problems and bottlenecks that  limit memory space,  Computing power  speed  Spam causes users to spend time removing it.
  • 5. Architecture of spam filtering rules and existing methods  This study describes three machine-learning algorithms to filter spam from valid emails with low error rates and high efficiency using a multilayer perceptron model.  Black list/white list,  Bayesian classifying algorithm,  Keyword matching and header information analysis
  • 6. Methodology  Decision-based systems  Bayesian classifiers  Support vector machine  Neural networks  Sample-based methods  Decision tree classifier, multilayer perceptron and naïve Bayes classifier provided in the proposed model.
  • 7. Multilayer perceptron (MLP) The model delivers information by activating input neurons containing values labelled on them. Activation of neurons is calculated in the middle or output layer, as follows:
  • 8. AC4.5 tree is modelledasfollows.Atraining setisasetofbasetuplesto determineclassesrelatedto thesetuples.AtupleXis representedbyanadjective vectorX¼(x1,x2,…,xn). Assumethatatuplebelongstoa predefinedclassthatis determined byanadjective calledasclasslabel.Thetraining setisrandomly selectedfromthebase;thisstep iscalledthelearningstep.This techniqueisveryefficientand extensivelyusesclassification.
  • 9. DecisionTree Classifier  A node of the tree represents a test on an adjective;  A branch exiting from a node represents possible outputs of a test.  A leaf represents a class label.  A decision tree includes a rule set by which objective functions can be predicted.The J48 algorithm is an optimized version of C4.5.  The algorithm used for this model uses greedy techniques shows classification of a sample based on a decision tree.
  • 10. Naïve Bayesian classifier Probability Distribution in compressed form. An important problem with Bayes theory is independence of random variables, the lack of which makes it difficult to calculate probability formula.
  • 11. Extraction of features and implementatio n of the considered algorithm The work here was based on rules for proper scoring in terms of the efficiency of rules.The considered rules were provided in three forms: 1) email header information analysis, 2) keyword matching, 3) main body of the message. A score was finally obtained for these rules.
  • 12. Table  Table 1  Series of rules to assign a score to the received email.  Via meaning of the name  Via domain names  Via blocked IP number  By detecting apostrophe  Via addresses of in the black list  Via addresses of in the white list  Via type of content  Via bounded content  Via content of name  Via undeclared addresses of recipients  Via main header  Receiving from an address and sending to a similar address
  • 13. Accuracy  The primary dataset included 750 valid emails as well as 750 spams.To extract vector features of an email, the following methods were used: 1) email header review,  2) keyword review,  3)black list and white list.  Labels of a class in the proposed model included L and S; the former represented Legitimate, and the latter is an alternative for Spam.  Three classifiers including naïve Bayes, C4.5 decision tree and MLP neural networks were run on training data byWEKA software.  The training data were tested in terms of message header information,black and white list and keyword review. Efficiency and accuracy of training data were evaluated by 10 classes of reliably valid data. An accuracy factor was calculated as follows.
  • 14. Conclusion Two other basic concepts are used for spam filtering operations, false positive and false negative.The former refers to those spams classified as valid emails and the latter refers to valid emails wrongly classified as spam. False positive rate of a classifier is applied to its efficiency.Table 1 shows the efficiency of the above classifiers. Efficiency of these three techniques depends on the following factors: 1) valid recognition rate, 2) data training time and 3) False positive rate. The efficiency of MLP neural network was better than the other models. MLP requires more time to develop the model. J48 and naïve Bayes algorithm require more time on learning experimental data.
  • 15. References [1] I. Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. SpyrpolousAn evaluation of naïve Bayesian anti-spam filtering Proceeding of 11th Euro Conference on Match Learn (2000) [2] I. Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. SpyrpolousAn experimental comparison of naïve Bayesian and keyword-based anti-spam filtering with personal e-mail messages Proceeding of the an International ACM SIGIR Conference on Res and Devel in Inform Retrieval (2000) [3] A. Asuncion, D. NewmanUCI Machine Learning Repository (2007) http://www.ics.uci.edu/mlearn/MLRRepository.html [4] E. Blanzieri, A. BrylA Survey of Learning-based Techniques of Email Spam Filtering Tech. Rep DIT-06-056 University of Trento, Information Engineering and Computer Science Department (2008) [5] J. Clark, I. Koprinska, J. PoonA neural network-based approach to automated email classification Proceeding of the IEEE/WIC International Conference on Web Intelligence (2003) [6] S.J. Delany, D. BridgeTextual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches Artif. Intell. Rev., 26 (1–2) (2006), pp. 75-87 CrossRefView Record in Scopus [7] F. Fdez Riverola, E. Iglesias, F. Diaz, J.R. Mendez, J.M. ChorchodoSpam hunting: an instance-based reasoning system for spam labeling anf filtering Decis. Support Syst., 43 (3) (2007), pp. 722-736 ArticlePDF (492KB)View Record in Scopus [8] A. SeewaldAn evaluation of Naïve Bayes variants in content-based learning for spam filtering Intell. Data Anal. (2009) [9] Ahmed KhorsiAn overview of content-based spam filtering techniques Informatica, 31 (3) (October 2007), pp. 269-277 View Record in Scopus