Spam Detection Using Natural Language processing

•Download as PPTX, PDF•

1 like•2,515 views

य

Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages

Data & Analytics

 INTRODUCTION
 NATURAL LANGUAGE PROCESSING
 SPAMMING
 RELATED WORK
 N-GRAM MODELLING
 WORD STEMMING
 BAYESIAN CLASSIFICATION

 PROPOSED MODEL
 COMPONENTS
 BLOCK DIAGRAM
 WORKING
 CONCLUSION
 QUESTION & ANSWER
 THANK YOU

Natural Language Processing (NLP)
is a field of computer science and
linguistics concerned with the
interactions between computers and
human (natural) languages.

Spamming is the use of electronic
messaging systems like e-mails and
other digital delivery systems and
broadcast media to send unwanted
bulk messages indiscriminately.

N-GRAM MODELLING
WORD STEMMING
BAYESIAN CLASSIFICATION

 An N-gram is an N-character slice of a longer string.
 N-grams of several different lengths simultaneously are
used in the processing while detecting spam emails.
 In N-gram blank space is replaces by underscore
character(“_”).

Here the example ,
WORD = STAY
 Bi-grams: _S, ST, TA, SY, Y_
 Tri-grams: _ _S, _ST, STA, TAY, AY_, Y_
 Quad-grams: _ _ _ S, _ _ ST, _STA, STAY,
TAY_, AY_ _, Y_ _ _

The following steps are used in a word stemming algorithm:
 Remove all non-alpha characters. (Allow some characters
like '/' ' ' '|' etc. which can be used together to look like some
characters, such as / for 'V')
 Remove all vowels from the word except the initial one.
 Replace all digits and symbols by similar looking digits or
characters.

 Replace consecutive repeated characters by a
single character and vice versa.
 Use phonetic algorithms like sound ex on the
resultant string.
 Give it a numeric value depending on the
operations performed over it.
 Update the database for that particular keyword.

 Bayesian Spam Filtering is a statistical method of
classifying an email into spam and legitimate classes.
 Naive Bayes classifiers work by correlating the use of
tokens with spam and non-spam emails and then using
Bayesian inference to calculate a probability that an email is
or is not spam.

Here,
A= no. Of Ham meassage
b= no. Of Spam messgae

 Email Input
 URL Source Check
 URL Blacklist Database
 Threshold Counter
 Keyword Extractor
 Classifier
 NLP Engine

WORKING
 An unclassified email is taken as input for processing.
 The URL Source checking block checks the URL of the incoming
email and searches URL Blacklist Database to find a match. If the
search is successful then, it categorizes the email as a spam or
considers it to be ham and processes further.
 At the same time there is a Threshold Counter which keeps track
of the number of emails coming from this source over a period of
time t.

 The Email is then passed to the Keyword Extractor which
will tokenize the message into keywords.
 These keywords are passed to the Classifier which will
classify the mail into a specific category E.
 The email is then passed to the NLP Engine along with its
category.

Spam mails are a serious concern to and a major annoyance
for many Internet users. The mode proposed as a solution in
this paper is highly beneficial because it introduces a
threshold counter which helps overcome congestion on the
web server and also maintain the spam filter efficiency but
at the same time, it also requires overhead storage space for
the databases.

Spam Detection Using Natural Language processing

What's hot

final-spam-e-mail-detection-180125111231.pptxinfotowards

Spam detection using machine learning based binary classifier_043660syaidatulamirah

E mail image spam filtering techniquesranjit banshpal

Spam filtering with Naive Bayes AlgorithmAkshay Pal

Spam Email identificationPartnered Health

Final Report(SuddhasatwaSatpathy)SkyBits Technologies Pvt. Ltd.

An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal

Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew

Detecting the presence of cyberbullying using computer softwareAshish Arora

Presentation on Sentiment AnalysisRebecca Williams

Amazon Product Sentiment reviewLalit Jain

Presentation-Detecting Spammers on Social NetworksAshish Arora

SMS Spam Filter Design Using R: A Machine Learning ApproachReza Rahimi

Handwritten Character RecognitionConstantine Priemski

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam

FAKE NEWS DETECTION PPT VaishaliSrigadhi

Seminar On Naive Bayes for Spam Filtering Asrarulhaq Maktedar

Sentiment Analysis of Twitter DataSumit Raj

Classification and RegressionMegha Sharma

Sentiment analysis using naive bayes classifier Dev Sahu

What's hot (20)

final-spam-e-mail-detection-180125111231.pptx

Spam detection using machine learning based binary classifier_043660

E mail image spam filtering techniques

Spam filtering with Naive Bayes Algorithm

Spam Email identification

Final Report(SuddhasatwaSatpathy)

An Approach for Malicious Spam Detection in Email with Comparison of Differen...

Natural language processing with python and amharic syntax parse tree by dani...

Detecting the presence of cyberbullying using computer software

Presentation on Sentiment Analysis

Amazon Product Sentiment review

Presentation-Detecting Spammers on Social Networks

SMS Spam Filter Design Using R: A Machine Learning Approach

Handwritten Character Recognition

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...

FAKE NEWS DETECTION PPT

Seminar On Naive Bayes for Spam Filtering

Sentiment Analysis of Twitter Data

Classification and Regression

Sentiment analysis using naive bayes classifier

Similar to Spam Detection Using Natural Language processing

spam_msg_detection.pdfBHOLESHANKARSINGH

Jt3616901697IJERA Editor

SAS Text MiningMitchell Sanregret

Cross breed Spam Categorization Method using Machine Learning TechniquesIJSRED

Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Venkat Projects

Prepare black list using bayesian approach to improve performance of spam fil...IAEME Publication

402 406Editor IJARCET

Thematic and self learning method forijcsa

Email deliverabilityAnton Panaitesco

SPAM FILTERSBismaSajjad9

Presentation2.pptxWanderer20

NetworkPaperthesis1Dhara Shah

A Model for Fuzzy Logic Based Machine Learning Approach for Spam FilteringIOSR Journals

Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR

Network paperthesis1Dhara Shah

Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR

A multi layer architecture for spam-detection systemcsandit

Similar to Spam Detection Using Natural Language processing (20)

spam_msg_detection.pdf

Jt3616901697

SAS Text Mining

Cross breed Spam Categorization Method using Machine Learning Techniques

Emailphishing(deep anti phishnet applying deep neural networks for phishing e...

Prepare black list using bayesian approach to improve performance of spam fil...

402 406

Thematic and self learning method for

Email deliverability

SPAM FILTERS

Presentation2.pptx

NetworkPaperthesis1

A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering

Sentence Validation by Statistical Language Modeling and Semantic Relations

Network paperthesis1

Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...

A multi layer architecture for spam-detection system

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Data-Analysis for Chicago Crime Data 2023ymrp368

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Halmar dropshipping via API with DroFxolyaivanovalion

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

April 2024 - Crypto Market Report's Analysismanisha194592

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Invezz.com - Grow your wealth with trading signalsInvezz1

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Data-Analysis for Chicago Crime Data 2023

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Halmar dropshipping via API with DroFx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

BigBuy dropshipping via API with DroFx.pptx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Schema on read is obsolete. Welcome metaprogramming..pdf

April 2024 - Crypto Market Report's Analysis

Capstone Project on IBM Data Analytics Program

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

BabyOno dropshipping via API with DroFx.pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Invezz.com - Grow your wealth with trading signals

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Generative AI on Enterprise Cloud with NiFi and Milvus

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Spam Detection Using Natural Language processing

2.  INTRODUCTION  NATURAL LANGUAGE PROCESSING  SPAMMING  RELATED WORK  N-GRAM MODELLING  WORD STEMMING  BAYESIAN CLASSIFICATION

3.  PROPOSED MODEL  COMPONENTS  BLOCK DIAGRAM  WORKING  CONCLUSION  QUESTION & ANSWER  THANK YOU

4. Natural Language Processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages.

5. Spamming is the use of electronic messaging systems like e-mails and other digital delivery systems and broadcast media to send unwanted bulk messages indiscriminately.

6. N-GRAM MODELLING WORD STEMMING BAYESIAN CLASSIFICATION

7.  An N-gram is an N-character slice of a longer string.  N-grams of several different lengths simultaneously are used in the processing while detecting spam emails.  In N-gram blank space is replaces by underscore character(“_”).

8. Here the example , WORD = STAY  Bi-grams: _S, ST, TA, SY, Y_  Tri-grams: _ _S, _ST, STA, TAY, AY_, Y_  Quad-grams: _ _ _ S, _ _ ST, _STA, STAY, TAY_, AY_ _, Y_ _ _

9. The following steps are used in a word stemming algorithm:  Remove all non-alpha characters. (Allow some characters like '/' ' ' '|' etc. which can be used together to look like some characters, such as / for 'V')  Remove all vowels from the word except the initial one.  Replace all digits and symbols by similar looking digits or characters.

10.  Replace consecutive repeated characters by a single character and vice versa.  Use phonetic algorithms like sound ex on the resultant string.  Give it a numeric value depending on the operations performed over it.  Update the database for that particular keyword.

11.  Bayesian Spam Filtering is a statistical method of classifying an email into spam and legitimate classes.  Naive Bayes classifiers work by correlating the use of tokens with spam and non-spam emails and then using Bayesian inference to calculate a probability that an email is or is not spam.

12. Here, A= no. Of Ham meassage b= no. Of Spam messgae

13.  Email Input  URL Source Check  URL Blacklist Database  Threshold Counter  Keyword Extractor  Classifier  NLP Engine

14.

15. WORKING  An unclassified email is taken as input for processing.  The URL Source checking block checks the URL of the incoming email and searches URL Blacklist Database to find a match. If the search is successful then, it categorizes the email as a spam or considers it to be ham and processes further.  At the same time there is a Threshold Counter which keeps track of the number of emails coming from this source over a period of time t.

16.  The Email is then passed to the Keyword Extractor which will tokenize the message into keywords.  These keywords are passed to the Classifier which will classify the mail into a specific category E.  The email is then passed to the NLP Engine along with its category.

17. Spam mails are a serious concern to and a major annoyance for many Internet users. The mode proposed as a solution in this paper is highly beneficial because it introduces a threshold counter which helps overcome congestion on the web server and also maintain the spam filter efficiency but at the same time, it also requires overhead storage space for the databases.

Spam Detection Using Natural Language processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spam Detection Using Natural Language processing

Similar to Spam Detection Using Natural Language processing (20)

More from युनीक तुषार गुप्ता

More from युनीक तुषार गुप्ता (20)

Recently uploaded

Recently uploaded (20)

Spam Detection Using Natural Language processing