SlideShare a Scribd company logo
1 of 16
MACHINE LEARNING
Project Title: Email-Spam Filtering
Aman Singhla 16212220
Shareesh Bellamkonda 16212926
Vikas Chillar 16212887
Vikas Chhillar
Machine Learning
Agenda:
Introduction
Problem Definition
Review Existing Methods
Proposed Methods
Description of Dataset
Description of Source Code
Machine learning
Introduction:
Machine learning focuses on the development of computer programs that can teach themselves to grow
and change when exposed to new data.
HAM
SPAM
Machine learning
Problem definition:
● We consider a dataset contains about 33,700 emails which are pre classified into ham and spam
emails.
● We have also found the top 10 words for ham emails and spam emails.
● We have also found out that which emails are generally longer i.e, ham or spam by calculating the
average word count in ham emails and spam emails.
Machine learning
Review existing methods:
Most Anti-spam programs are designed to do the same job but they all go about it in a different way. There
are different techniques used for filtering:
● List based.
● Content-Based Filters.
Machine learning
Proposed method
We used 2 algorithms for our work
1)Support Vector Machine (SVM)
2)Naïve Bayes Algorithm
Receiver Operating Characteristic (ROC)
Receiver Operating Characteristic curve (or ROC curve) is a plot of the true positive rate against the
false positive rate for the different possible cutpoints of a diagnostic test.
Description of Dataset:
Enron Dataset 1: Ham Emails - 3672 emails Spam Emails - 1500 emails
Enron Dataset 2: Ham Emails - 4361 emails Spam Emails - 1496 emails
Enron Dataset 3: Ham Emails - 4012 emails Spam Emails - 1500 emails
Enron Dataset 4: Ham Emails - 1500 emails Spam Emails - 4500 emails
Enron Dataset 5: Ham Emails - 1500 emails Spam Emails - 3675 emails
Enron Dataset 6: Ham Emails - 1500 emails Spam Emails - 4500 emails
Total Ham Emails: 16545 emails
Total Spam Emails: 17171 emails
Total Emails: 33716 emails
Description of Source Code:
➢Split Dataset into 70-30(Training and Testing)
➢Make Dictionary and Extract Features
➢Top 10 words for Ham and Spam
➢Which emails are longer?
➢ROC Curve
Results:
Number of Train Emails(Training Set):
70% of 33,716 emails = 23,596 emails
Number of Test Emails(Test Set):
30% of 33716 emails = 10,112 emails
Confusion Matrix:
For Multinomial Naive Bayes: [[4822 143]
[115 5031]]
For Scalar Vector Machines: [[4843 122]
[78
5068]]
Top 10 words for Ham:
Top 10 Words for Spam:
Which emails are generally longer ??
Here, we have calculated average word count for Ham Emails and Spam Emails
separately and then predicted which emails are generally longer.
Average Word Count for Ham Emails: 365.5 words
Average Word Count for Spam Emails: 261.3 words
So, it can be concluded that, Ham emails are generally longer than spam emails.
ROC Curve:
Division of Tasks:
Source Code: majorly done by Aman, help given by Vikas and
Shareesh
Report: majorly done by Vikas, help given by Aman and
Shareesh
Presentation: majorly done by Shareesh, help given by Aman and
Vikas
Video: all 3 of us.
Thank You !!

More Related Content

What's hot

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filteringijtsrd
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmAkshay Pal
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660syaidatulamirah
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Presentation2.pptx
Presentation2.pptxPresentation2.pptx
Presentation2.pptxWanderer20
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharyasankhadeep
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdfKumbidiGaming
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection projectHarshdaGhai
 

What's hot (20)

Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filtering
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
 
Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660Spam detection using machine learning based binary classifier_043660
Spam detection using machine learning based binary classifier_043660
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Presentation2.pptx
Presentation2.pptxPresentation2.pptx
Presentation2.pptx
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharya
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
spam_msg_detection.pdf
spam_msg_detection.pdfspam_msg_detection.pdf
spam_msg_detection.pdf
 
Spam
Spam Spam
Spam
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
 

Similar to Machine Learning Project - Email Spam Filtering using Enron Dataset

WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
 
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Venkat Projects
 
Blockmail Technical White Paper
Blockmail   Technical White PaperBlockmail   Technical White Paper
Blockmail Technical White Paperniallmmackey
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeEditor IJCATR
 
24 Hours Of Exchange Server 2007 ( Part 13 Of 24)
24  Hours Of  Exchange  Server 2007 ( Part 13 Of 24)24  Hours Of  Exchange  Server 2007 ( Part 13 Of 24)
24 Hours Of Exchange Server 2007 ( Part 13 Of 24)Harold Wong
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1Dhara Shah
 
Evaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsEvaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsMichael Lamont
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1Dhara Shah
 
NetworkPaperthesis2
NetworkPaperthesis2NetworkPaperthesis2
NetworkPaperthesis2Dhara Shah
 
Do Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern RecognitionDo Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern RecognitionBitdefender
 
2010 Spam Filtered World Fv
2010 Spam Filtered World Fv2010 Spam Filtered World Fv
2010 Spam Filtered World Fvcactussky
 
Configuring Mail Filters
Configuring Mail FiltersConfiguring Mail Filters
Configuring Mail FiltersHTS Hosting
 
Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam  Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam IJECEIAES
 

Similar to Machine Learning Project - Email Spam Filtering using Enron Dataset (20)

spam.ppt
spam.pptspam.ppt
spam.ppt
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
 
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
 
Jt3616901697
Jt3616901697Jt3616901697
Jt3616901697
 
Blockmail Technical White Paper
Blockmail   Technical White PaperBlockmail   Technical White Paper
Blockmail Technical White Paper
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
Email deliverability
Email deliverabilityEmail deliverability
Email deliverability
 
Identifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision TreeIdentifying Valid Email Spam Emails Using Decision Tree
Identifying Valid Email Spam Emails Using Decision Tree
 
EmailTracing.ppt
EmailTracing.pptEmailTracing.ppt
EmailTracing.ppt
 
Thinking in software testing
Thinking in software testingThinking in software testing
Thinking in software testing
 
24 Hours Of Exchange Server 2007 ( Part 13 Of 24)
24  Hours Of  Exchange  Server 2007 ( Part 13 Of 24)24  Hours Of  Exchange  Server 2007 ( Part 13 Of 24)
24 Hours Of Exchange Server 2007 ( Part 13 Of 24)
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1
 
Evaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsEvaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam Solutions
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
 
NetworkPaperthesis2
NetworkPaperthesis2NetworkPaperthesis2
NetworkPaperthesis2
 
Do Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern RecognitionDo Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern Recognition
 
2010 Spam Filtered World Fv
2010 Spam Filtered World Fv2010 Spam Filtered World Fv
2010 Spam Filtered World Fv
 
402 406
402 406402 406
402 406
 
Configuring Mail Filters
Configuring Mail FiltersConfiguring Mail Filters
Configuring Mail Filters
 
Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam  Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Machine Learning Project - Email Spam Filtering using Enron Dataset

  • 1. MACHINE LEARNING Project Title: Email-Spam Filtering Aman Singhla 16212220 Shareesh Bellamkonda 16212926 Vikas Chillar 16212887 Vikas Chhillar
  • 2. Machine Learning Agenda: Introduction Problem Definition Review Existing Methods Proposed Methods Description of Dataset Description of Source Code
  • 3. Machine learning Introduction: Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. HAM SPAM
  • 4. Machine learning Problem definition: ● We consider a dataset contains about 33,700 emails which are pre classified into ham and spam emails. ● We have also found the top 10 words for ham emails and spam emails. ● We have also found out that which emails are generally longer i.e, ham or spam by calculating the average word count in ham emails and spam emails.
  • 5. Machine learning Review existing methods: Most Anti-spam programs are designed to do the same job but they all go about it in a different way. There are different techniques used for filtering: ● List based. ● Content-Based Filters.
  • 6. Machine learning Proposed method We used 2 algorithms for our work 1)Support Vector Machine (SVM) 2)Naïve Bayes Algorithm Receiver Operating Characteristic (ROC) Receiver Operating Characteristic curve (or ROC curve) is a plot of the true positive rate against the false positive rate for the different possible cutpoints of a diagnostic test.
  • 7. Description of Dataset: Enron Dataset 1: Ham Emails - 3672 emails Spam Emails - 1500 emails Enron Dataset 2: Ham Emails - 4361 emails Spam Emails - 1496 emails Enron Dataset 3: Ham Emails - 4012 emails Spam Emails - 1500 emails Enron Dataset 4: Ham Emails - 1500 emails Spam Emails - 4500 emails Enron Dataset 5: Ham Emails - 1500 emails Spam Emails - 3675 emails Enron Dataset 6: Ham Emails - 1500 emails Spam Emails - 4500 emails Total Ham Emails: 16545 emails Total Spam Emails: 17171 emails Total Emails: 33716 emails
  • 8. Description of Source Code: ➢Split Dataset into 70-30(Training and Testing) ➢Make Dictionary and Extract Features ➢Top 10 words for Ham and Spam ➢Which emails are longer? ➢ROC Curve
  • 9. Results: Number of Train Emails(Training Set): 70% of 33,716 emails = 23,596 emails Number of Test Emails(Test Set): 30% of 33716 emails = 10,112 emails
  • 10. Confusion Matrix: For Multinomial Naive Bayes: [[4822 143] [115 5031]] For Scalar Vector Machines: [[4843 122] [78 5068]]
  • 11. Top 10 words for Ham:
  • 12. Top 10 Words for Spam:
  • 13. Which emails are generally longer ?? Here, we have calculated average word count for Ham Emails and Spam Emails separately and then predicted which emails are generally longer. Average Word Count for Ham Emails: 365.5 words Average Word Count for Spam Emails: 261.3 words So, it can be concluded that, Ham emails are generally longer than spam emails.
  • 15. Division of Tasks: Source Code: majorly done by Aman, help given by Vikas and Shareesh Report: majorly done by Vikas, help given by Aman and Shareesh Presentation: majorly done by Shareesh, help given by Aman and Vikas Video: all 3 of us.