SlideShare a Scribd company logo
1 of 10
Spam filtering with
Naïve bayes algorithm
DONE BY-
AKSHAY PAL
ASHUTOSH RANJHAN
Abstract
Naïve bayes algorithm is a machine learning algorithm for classification problems.it is
primarily used for text classification which involves high dimensional data sets.a few examples
are spam filteration,sentimental analysis and classifying news articles.
Identifying the ducument into a particular category is still presenting challenge because of
large and vast amount of featuers in the datasets.
Naïve bayes is very popular in commercial and open source anti-spam e-mail filtration.naive
bayes has been studied extensively since the 1950s.
Naves bayes is potentially good at serving as a document classification model.
We have discuss the mathematical implementation of naïve bayes classifier for spam
filteration.
Introduction
Although several machine learning algorithms have
been employed in anti-spam e-mail filtering,
including algorithms that are considered top-
performers in text classification, like Boosting and
Support Vector Machines, decision tree,neural
network,logistic regression.
but Naive Bayes (nb) classifiers currently appear to
be particularly popular in commercial and open-
sourcespam filters.This
is probably due to their simplicity, which
makes them easy to implement, their linear
computational complexity, and their accuracy,
which in spam filtering is comparable to that of
more elaborate learning algorithms.
Naïve bayes algorithm is called 'naïve ' because it make the assumption that
the occurrence of certain feature is independent of the occurrence of other
features.
Naive Bayes classifiers are a popular statistical technique of e-mail filtering.
They typically use bag of words features to identify spam e-mail, an approach
commonly used in text classification.
Naive Bayes classifiers work by correlating the use of tokens (typically words,
or sometimes other things), with spam and non-spam e-mails and then using
Bayes' theorem to calculate a probability that an email is or is not spam.
Naive Bayes spam filtering is a baseline technique for dealing with spam
can tailor itself to the email needs of individual users and give low false
positive spam detection rates that are generally acceptable to users. It is one
of the oldest ways of doing spam filtering, with roots in the 1990s.
Literature survey
 Bayesian algorithms were used to sort and filter email by 1996. Although naive
Bayesian filters did not become popular until later, multiple programs were
released in 1998 to address the growing problem of unwanted email. The first
scholarly publication on Bayesian spam filtering was by Sahami et al in 1998. That
work was soon thereafter deployed in commercial spam filters. However, in 2002
Paul Graham greatly decreased the false positive rate, so that it could be used on
its own as a single spam filter.
 Variants of the basic technique have been implemented in a number of research
works and commercial software products. Many modern mail clients implement
Bayesian spam filtering. Users can also install separate email filtering programs.
Server-side email filters, such as DSPAM, Spam Assassin, Spam Bayes, Bogofilter
and ASSP, make use of Bayesian spam filtering techniques, and the functionality is
sometimes embedded within mail server software itself. CRM114, oft cited as a
Bayesian filter, is not intended to use a Bayes filter in production, but includes the
″unigram″ feature for reference.
Bayes theorem
 Bayes -"it refer to the statistician and philospher
thomas bayes and the theorem named after
him,bayes theorem,which is the base for the
naïve bayes algorithm".
 Bayes theorem-
 Bayes theorem is stated as probability of the
event B given A is equal to the probability of the
event A given B multiplied by the probability of
A upon probability of B.
 Let us understand what's the bayes theorem
Mathematical foundation
Bayesian email filters utilize Bayes' theorem. Bayes' theorem is used several times in the context of
spam:
 a first time, to compute the probability that the message is spam, knowing that a given word
appears in this message;
 a second time, to compute the probability that the message is spam, taking into consideration all
of its words (or a relevant subset of them);
 sometimes a third time, to deal with rare words.
Computing the probability that a message containing a given word is spam
Let's suppose the suspected message contains the word "replica". Most people who are used to receiving e-mail
know that this message is likely to be spam, more precisely a proposal to sell counterfeit copies of well-known
brands of watches. The spam detection software, however, does not "know" such facts; all it can do is compute
probabilities.
The formula used by the software to determine that, is derived from Bayes' theorem.
Conclusion
 Text classification with naïve bayes algorithm is equally good and comparable with
other method of classification.
 One of the best advantages of bayesian spam filtering is that it can be trained on
a per-user basis.
 The spam that a user receives is often related to the online user's activities.
 The legitimate e-mails a user receives will tend to be different.
References
 https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
 https://en.wikipedia.org/wiki/Naive_Bayes_classifier
 https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-
bayes-classification
 https://link.springer.com/chapter/10.1007/3-540-31662-0_38?no-access=true
 https://www.youtube.com/watch?v=EGKeC2S44Rs

More Related Content

What's hot

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660syaidatulamirah
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system pptSheetal Verma
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design PrinciplesSHUBHA CHATURVEDI
 
Language modelling and its use cases
Language modelling and its use casesLanguage modelling and its use cases
Language modelling and its use casesKhrystyna Skopyk
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachReza Rahimi
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detectionbutest
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxinfotowards
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTIONumme ayesha
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 

What's hot (20)

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Intruders
IntrudersIntruders
Intruders
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system ppt
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design Principles
 
Types of attacks
Types of attacksTypes of attacks
Types of attacks
 
Secure Hash Algorithm
Secure Hash AlgorithmSecure Hash Algorithm
Secure Hash Algorithm
 
Language modelling and its use cases
Language modelling and its use casesLanguage modelling and its use cases
Language modelling and its use cases
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptx
 
Text MIning
Text MIningText MIning
Text MIning
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
 
Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 

Similar to Spam filtering with Naive Bayes Algorithm

Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
How to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkHow to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkGFI Software
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1Dhara Shah
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningIRJET Journal
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1Dhara Shah
 
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNINGDETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNINGazziefaazahar
 
Maia Mailguard Ncsa 9 May 2005
Maia Mailguard Ncsa 9 May 2005Maia Mailguard Ncsa 9 May 2005
Maia Mailguard Ncsa 9 May 2005Tanner Lovelace
 
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERSAN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERSijsrd.com
 
Cross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesCross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesIJSRED
 
Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...IAEME Publication
 
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...IDES Editor
 

Similar to Spam filtering with Naive Bayes Algorithm (20)

Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
402 406
402 406402 406
402 406
 
How to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkHow to Keep Spam Off Your Network
How to Keep Spam Off Your Network
 
Jt3616901697
Jt3616901697Jt3616901697
Jt3616901697
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
 
Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1
 
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNINGDETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
DETECTING SPAM BY USING NAÏVE BAYES IN MACHINE LEARNING
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
 
spam_msg_detection.pdf
spam_msg_detection.pdfspam_msg_detection.pdf
spam_msg_detection.pdf
 
Maia Mailguard Ncsa 9 May 2005
Maia Mailguard Ncsa 9 May 2005Maia Mailguard Ncsa 9 May 2005
Maia Mailguard Ncsa 9 May 2005
 
B0940509
B0940509B0940509
B0940509
 
Email deliverability
Email deliverabilityEmail deliverability
Email deliverability
 
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERSAN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS
AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS
 
Cross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning TechniquesCross breed Spam Categorization Method using Machine Learning Techniques
Cross breed Spam Categorization Method using Machine Learning Techniques
 
Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...
 
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
 

Recently uploaded

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

Spam filtering with Naive Bayes Algorithm

  • 1. Spam filtering with Naïve bayes algorithm DONE BY- AKSHAY PAL ASHUTOSH RANJHAN
  • 2. Abstract Naïve bayes algorithm is a machine learning algorithm for classification problems.it is primarily used for text classification which involves high dimensional data sets.a few examples are spam filteration,sentimental analysis and classifying news articles. Identifying the ducument into a particular category is still presenting challenge because of large and vast amount of featuers in the datasets. Naïve bayes is very popular in commercial and open source anti-spam e-mail filtration.naive bayes has been studied extensively since the 1950s. Naves bayes is potentially good at serving as a document classification model. We have discuss the mathematical implementation of naïve bayes classifier for spam filteration.
  • 3. Introduction Although several machine learning algorithms have been employed in anti-spam e-mail filtering, including algorithms that are considered top- performers in text classification, like Boosting and Support Vector Machines, decision tree,neural network,logistic regression. but Naive Bayes (nb) classifiers currently appear to be particularly popular in commercial and open- sourcespam filters.This is probably due to their simplicity, which makes them easy to implement, their linear computational complexity, and their accuracy, which in spam filtering is comparable to that of more elaborate learning algorithms.
  • 4. Naïve bayes algorithm is called 'naïve ' because it make the assumption that the occurrence of certain feature is independent of the occurrence of other features. Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag of words features to identify spam e-mail, an approach commonly used in text classification. Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam. Naive Bayes spam filtering is a baseline technique for dealing with spam can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s.
  • 5. Literature survey  Bayesian algorithms were used to sort and filter email by 1996. Although naive Bayesian filters did not become popular until later, multiple programs were released in 1998 to address the growing problem of unwanted email. The first scholarly publication on Bayesian spam filtering was by Sahami et al in 1998. That work was soon thereafter deployed in commercial spam filters. However, in 2002 Paul Graham greatly decreased the false positive rate, so that it could be used on its own as a single spam filter.  Variants of the basic technique have been implemented in a number of research works and commercial software products. Many modern mail clients implement Bayesian spam filtering. Users can also install separate email filtering programs. Server-side email filters, such as DSPAM, Spam Assassin, Spam Bayes, Bogofilter and ASSP, make use of Bayesian spam filtering techniques, and the functionality is sometimes embedded within mail server software itself. CRM114, oft cited as a Bayesian filter, is not intended to use a Bayes filter in production, but includes the ″unigram″ feature for reference.
  • 6. Bayes theorem  Bayes -"it refer to the statistician and philospher thomas bayes and the theorem named after him,bayes theorem,which is the base for the naïve bayes algorithm".  Bayes theorem-  Bayes theorem is stated as probability of the event B given A is equal to the probability of the event A given B multiplied by the probability of A upon probability of B.  Let us understand what's the bayes theorem
  • 7. Mathematical foundation Bayesian email filters utilize Bayes' theorem. Bayes' theorem is used several times in the context of spam:  a first time, to compute the probability that the message is spam, knowing that a given word appears in this message;  a second time, to compute the probability that the message is spam, taking into consideration all of its words (or a relevant subset of them);  sometimes a third time, to deal with rare words.
  • 8. Computing the probability that a message containing a given word is spam Let's suppose the suspected message contains the word "replica". Most people who are used to receiving e-mail know that this message is likely to be spam, more precisely a proposal to sell counterfeit copies of well-known brands of watches. The spam detection software, however, does not "know" such facts; all it can do is compute probabilities. The formula used by the software to determine that, is derived from Bayes' theorem.
  • 9. Conclusion  Text classification with naïve bayes algorithm is equally good and comparable with other method of classification.  One of the best advantages of bayesian spam filtering is that it can be trained on a per-user basis.  The spam that a user receives is often related to the online user's activities.  The legitimate e-mails a user receives will tend to be different.
  • 10. References  https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering  https://en.wikipedia.org/wiki/Naive_Bayes_classifier  https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive- bayes-classification  https://link.springer.com/chapter/10.1007/3-540-31662-0_38?no-access=true  https://www.youtube.com/watch?v=EGKeC2S44Rs