SlideShare a Scribd company logo

Spam email detection using machine learning PPT.pptx

This PPT provides you deep knowledge of how we can detect the spam emails or messages using machine learning...!! Thank you

Spam email detection using machine learning PPT.pptx

1 of 21
Download to read offline
CSMSS
Chh. Shahu College of Engineering,Aurangabad
Seminar On
“Email & SMS Spam Detection”
Guided By:
Dr. S.V. Khidse
Presenting By:
Kunal kalamkar(3271)
Department of Computer Science and Engineering
2021-22
1
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Contents
 Introduction
 Technologies
 Libraries
 Machine Learning
 Data set
 Problem definition
 Description of dataset
 Methodology
 Algorithms
 Conclusion
 References
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
Introduction
3
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
In today’s globalized world, email is a primary source of communication. This
communication can vary from personal, business, corporate to government. With the rapid
increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also
known as junk email involves nearly identical messages sent to numerous recipients by
email. Apart from being annoying, spam emails can also pose a security threat to computer
system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this
project, we use text mining to perform automatic spam filtering to use emails effectively. We
try to identify patterns using Data-mining classification algorithms to enable us classify the
emails as HAM or SPAM
Technologies
Technologies used :-
1. Python
2. HTML
3. CSS
4. JavaScript
Python :- Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics developed by Guido van Rossum
Libraries: 1. Numpy
2. Pandas
3. Sklearn
4. NLTK
4
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Libraries
NumPy:- NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation
of the Machine Learning stack.
 Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning
and analysis. It has features which are used for exploring, cleaning, transforming and
visualizing from data
 NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas,
including empirical linguistics, cognitive science, artificial intelligence, information retrieval,
and machine learning.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
Continued…
 Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization.
It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the
top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
Ad

Recommended

Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detectionPartnered Health
 
Machine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron DatasetMachine Learning Project - Email Spam Filtering using Enron Dataset
Machine Learning Project - Email Spam Filtering using Enron DatasetAman Singhla
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmAkshay Pal
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 

More Related Content

What's hot

Spam Mail Prediction Using Machine Learning.pptx
Spam Mail Prediction Using Machine Learning.pptxSpam Mail Prediction Using Machine Learning.pptx
Spam Mail Prediction Using Machine Learning.pptxSejalWasule
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksAshish Arora
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentationnewsan2001
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660syaidatulamirah
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Kedar Damkondwar
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websitesm srikanth
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareAshish Arora
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 

What's hot (20)

Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
Spam Mail Prediction Using Machine Learning.pptx
Spam Mail Prediction Using Machine Learning.pptxSpam Mail Prediction Using Machine Learning.pptx
Spam Mail Prediction Using Machine Learning.pptx
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
BULK SMS SENDER project report
BULK SMS SENDER project reportBULK SMS SENDER project report
BULK SMS SENDER project report
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
Sms spam classification
Sms spam classificationSms spam classification
Sms spam classification
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
 
Voice assistant ppt
Voice assistant pptVoice assistant ppt
Voice assistant ppt
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Handwritten Character Recognition
Handwritten Character RecognitionHandwritten Character Recognition
Handwritten Character Recognition
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer software
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 

Similar to Spam email detection using machine learning PPT.pptx

Classification with R
Classification with RClassification with R
Classification with RNajima Begum
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYashShiva3
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfHassanElalfy4
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningIRJET Journal
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...ijsrd.com
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM cscpconf
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMIRJET Journal
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 

Similar to Spam email detection using machine learning PPT.pptx (20)

spam_msg_detection.pdf
spam_msg_detection.pdfspam_msg_detection.pdf
spam_msg_detection.pdf
 
Classification with R
Classification with RClassification with R
Classification with R
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptx
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 

Recently uploaded

Python Workshop Day - 03.pptx
Python Workshop Day - 03.pptxPython Workshop Day - 03.pptx
Python Workshop Day - 03.pptxShivanshSeth6
 
Hydraulics Introduction& Hydrostatics.pdf
Hydraulics  Introduction&   Hydrostatics.pdfHydraulics  Introduction&   Hydrostatics.pdf
Hydraulics Introduction& Hydrostatics.pdfGetacher Teshome
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxnikshaikh786
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxasmitaTele2
 
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...Kuvempu University
 
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...MianHusnainIqbal2
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxADILRASHID54
 
Fundamentals of Data Structure_Unit I.pptx
Fundamentals of Data Structure_Unit I.pptxFundamentals of Data Structure_Unit I.pptx
Fundamentals of Data Structure_Unit I.pptxDr. Madhuri Jawale
 
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLE
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLEBHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLE
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLEKuberBhusal1
 
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.pptMohanumar S
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balajiSriBalaji891607
 
Economic of Chapter 02 Presentation.pptx
Economic of Chapter 02 Presentation.pptxEconomic of Chapter 02 Presentation.pptx
Economic of Chapter 02 Presentation.pptxManahilAftab4
 
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSathvikaAlagar
 
sahana sri D AD21046 SELF INTRODUCTION.pdf
sahana sri D AD21046 SELF INTRODUCTION.pdfsahana sri D AD21046 SELF INTRODUCTION.pdf
sahana sri D AD21046 SELF INTRODUCTION.pdfsahanaaids46
 
Documento 50 - 1-120 - 51+52 borrador.pdf
Documento 50 - 1-120 - 51+52 borrador.pdfDocumento 50 - 1-120 - 51+52 borrador.pdf
Documento 50 - 1-120 - 51+52 borrador.pdfFRANCISCOJUSTOSIERRA
 
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’s
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’sELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’s
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’sKuvempu University
 
Better Builder Magazine, Issue 48 / Winter 2023
Better Builder Magazine, Issue 48 / Winter 2023Better Builder Magazine, Issue 48 / Winter 2023
Better Builder Magazine, Issue 48 / Winter 2023Better Builder Magazine
 
Exit exam ethiopia 2015 mechanical engineering.pdf
Exit exam ethiopia 2015 mechanical engineering.pdfExit exam ethiopia 2015 mechanical engineering.pdf
Exit exam ethiopia 2015 mechanical engineering.pdfAbrahamTelila
 

Recently uploaded (20)

Python Workshop Day - 03.pptx
Python Workshop Day - 03.pptxPython Workshop Day - 03.pptx
Python Workshop Day - 03.pptx
 
Hydraulics Introduction& Hydrostatics.pdf
Hydraulics  Introduction&   Hydrostatics.pdfHydraulics  Introduction&   Hydrostatics.pdf
Hydraulics Introduction& Hydrostatics.pdf
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptx
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptx
 
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...
ELH – 3.1: ADVANCED DIGITAL COMMUNICATION UNIT – I Digital modulation techniq...
 
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...
STRETCHABLE STRAIN SENSORS BASED ON POLYPYRROLE AND THERMOPLASTIC POLYURETHAN...
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptx
 
Fundamentals of Data Structure_Unit I.pptx
Fundamentals of Data Structure_Unit I.pptxFundamentals of Data Structure_Unit I.pptx
Fundamentals of Data Structure_Unit I.pptx
 
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLE
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLEBHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLE
BHUSHAN STEEL.pdf BROCHURE FOR STEEL TABLE
 
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balaji
 
Economic of Chapter 02 Presentation.pptx
Economic of Chapter 02 Presentation.pptxEconomic of Chapter 02 Presentation.pptx
Economic of Chapter 02 Presentation.pptx
 
Power System - Types of Power Plants overview
Power System - Types of Power Plants  overviewPower System - Types of Power Plants  overview
Power System - Types of Power Plants overview
 
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdfSATHVIKA A AD21049 SELF INTRODUCTION.pdf
SATHVIKA A AD21049 SELF INTRODUCTION.pdf
 
sahana sri D AD21046 SELF INTRODUCTION.pdf
sahana sri D AD21046 SELF INTRODUCTION.pdfsahana sri D AD21046 SELF INTRODUCTION.pdf
sahana sri D AD21046 SELF INTRODUCTION.pdf
 
Going Staff
Going StaffGoing Staff
Going Staff
 
Documento 50 - 1-120 - 51+52 borrador.pdf
Documento 50 - 1-120 - 51+52 borrador.pdfDocumento 50 - 1-120 - 51+52 borrador.pdf
Documento 50 - 1-120 - 51+52 borrador.pdf
 
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’s
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’sELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’s
ELH-1.3 PIC & ARM MICROCONTROLLER UNIT I Microcontroller’s
 
Better Builder Magazine, Issue 48 / Winter 2023
Better Builder Magazine, Issue 48 / Winter 2023Better Builder Magazine, Issue 48 / Winter 2023
Better Builder Magazine, Issue 48 / Winter 2023
 
Exit exam ethiopia 2015 mechanical engineering.pdf
Exit exam ethiopia 2015 mechanical engineering.pdfExit exam ethiopia 2015 mechanical engineering.pdf
Exit exam ethiopia 2015 mechanical engineering.pdf
 

Spam email detection using machine learning PPT.pptx

  • 1. CSMSS Chh. Shahu College of Engineering,Aurangabad Seminar On “Email & SMS Spam Detection” Guided By: Dr. S.V. Khidse Presenting By: Kunal kalamkar(3271) Department of Computer Science and Engineering 2021-22 1 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 2. Contents  Introduction  Technologies  Libraries  Machine Learning  Data set  Problem definition  Description of dataset  Methodology  Algorithms  Conclusion  References DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
  • 3. Introduction 3 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD In today’s globalized world, email is a primary source of communication. This communication can vary from personal, business, corporate to government. With the rapid increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also known as junk email involves nearly identical messages sent to numerous recipients by email. Apart from being annoying, spam emails can also pose a security threat to computer system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this project, we use text mining to perform automatic spam filtering to use emails effectively. We try to identify patterns using Data-mining classification algorithms to enable us classify the emails as HAM or SPAM
  • 4. Technologies Technologies used :- 1. Python 2. HTML 3. CSS 4. JavaScript Python :- Python is an interpreted, object-oriented, high-level programming language with dynamic semantics developed by Guido van Rossum Libraries: 1. Numpy 2. Pandas 3. Sklearn 4. NLTK 4 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 5. Libraries NumPy:- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation of the Machine Learning stack.  Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning and analysis. It has features which are used for exploring, cleaning, transforming and visualizing from data  NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
  • 6. Continued…  Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization. It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
  • 7. Machine Learning Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed “. • Machine learning is programming computers to optimize a performance criterion using example data or past experience . • The field of study known as machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 7
  • 8. Data set A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. dataset as “a collection of data that is treated as a single unit by a computer”. This means that a dataset contains a lot of separate pieces of data but can be used to train an algorithm with the goal of finding predictable patterns inside the whole dataset. How to train the data? -> AI training data will vary depending on whether you’re using supervised or unsupervised learning. Unsupervised learning uses unlabeled data. Models are tasked with finding patterns (or similarities and deviations) in the data to make inferences and reach conclusions. With supervised learning, on the other hand, humans must tag, label, or annotate the data to their criteria, in order to train the model to reach the desired conclusion (output). Labeled data is shown in the examples above, where the desired outputs are predetermined. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 8
  • 9. Problem Definition  Short Message (SMS) and email has grown into a multi-billion dollars commercial industry.  SMS spam is still not as common as email spam.  SMS Spam is showing growth, and in 2012 in parts of Asia up to 30% of text messages was spam. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 9
  • 10. Description of Dataset DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 10 Spam email percentage in the dataset = 12.63268156424581 % Ham email percentage in the dataset = 87.37731843575419 % The dataset consist of 5574 text message from UCI Machine learning repository
  • 11. Methodology DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 11
  • 12. Algorithms  Different algorithms used for email spam detection:- 1. Deep learning 2. Naive Bayes 3. Support Vector Machines 4. K-Nearest Neighbour 5. Rough Sets 6. Random Forests 7. Multinomial naive DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 12
  • 13. Classification of Algorithms (Naïve Bayes) NB algorithm is applied to the final extracted features. The speed and simplicity along with high accuracy of this algorithm makes it a desirable classifier for spam detection problems. Applying naïve Bayes with multinomial event model to the dataset and using 10-fold cross validation results in Table 1. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 13
  • 14. Word Cloud for Spam/Ham words DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 14 Top Spam Words Top Ham Words
  • 15. Which email/SMS is generally longer ? Here, we have calculated average word count for Ham Emails and Spam Emails separately and then predicted which emails are generally longer. Average Word Count for Ham Emails: 4516.00 words Average Word Count for Spam Emails: 653.000 words So, it can be concluded that, Ham emails are generally longer than spam emails. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 15 Spam Avg. Ham Avg.
  • 16. Home of application DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 16
  • 17. Detecting spam messages DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 17
  • 18. Detecting ham messages/email DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 18
  • 19. Conclusion Spam is a major problem in today's world. Spam messages are the most unwanted messages the end user clients receive in our daily lives. Spam emails are available nothing but an ad for any company, any kind of virus etc. It will be too much. It is easy for hackers to access our system using these spam emails DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 19
  • 21. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 21