spam_msg_detection.pdf

Spam Message Detection
Under the guidance of Prof. Abhishek Gupta
Computer Science & Engineering (Deptt.)
Submitted By - Bhole Shankar Singh (19BCS027)
Shri Mata Vaishno Devi University
Katra, J&K 182320

Chat technology is simply one aspect of SMS. SMS
technology was made possible by standard, an
accepted international standard. Spam is the term
for the abuse of electronic messaging services to
send large numbers of unwanted messages to
anybody. Even though SMS spam is the most well-
known example.
real SMS spam databases from the ML repository are
used here. Following feature for extraction and
preprocessing, On the databases, numerous
machine learning methods are used like nltk, Naive
Bayes, Random Forest etc.
we used some ML libraries like numpy,
pandas,sklearn etc
Introduction

Objective
E
The dangers of spam messages for the
users are many: undesired
advertisement, exposure of private
information, becoming a victim of a
fraud or financial scheme, being lured
into malware and phishing websites,
involuntary exposition to inappropriate
content, etc. For the network operator,
spam messages result in an increased
cost in operations.
The results utilising that in this study
decreases the total error rate of the
best model in the original research
referencing this.

The bayes Theorem one of the first problastic algorithm created by Reverend- Bayes
We have used the Multinomial naïve_bayes function from sklearn library. This model
understands how each message is classified as spam or ham based on the naïve bayes
formula on each word (frequency) in each message.
By using the alternative form of Naïve Bayes formula, we calculate the probability of
each message being spam based on the spam probability of each word. Then, spam
probabilities of all words in the message are multiplied together to determine whether a
message is spam or not. If P(Spam | word) > 0.5, then the Multinomial Naïve Bayes model
classify the message as spam message, else, ham message.
Some Important ML Algorithm
Naive Bayes

NLTK
Now with the help of NLP library "NLTK", first remove the punctuation and
special symbols from all the SMS and then lower case them. You can even
tokenize each SMS into sentences and words after removing punctuation
& special symbols. Here I am just splitting each SMS into words with white
spaces. However, tokenization and parsing may be the best idea to split
the texts. Please note that converting all the data to lower case helps in
the process of preprocessing and in later stages in the NLP application.
Here the messages are in the human-readable language which computer
can't understand, so we have to use the NLP to make it possible for
computers to read human (natural) language SMS and determine which
parts are important.
So, Natural language processing (NLP) is a branch of artificial intelligence
that helps computers understand, interpret and manipulate human
language.
NLP makes it possible for computers to read the text, hear speech,
interpret it, measure sentiment and determine which parts are important.

DATASET
Dataset_Name= Spam.csv
Definite_Variable = Ham and Spam used in output
process
Indefinite_Variable= ALL messages used in input
process

With an overall accuracy of 98.60%, enhanced naive Bayes is the next best
classifier in their research. When compared to the outcome of earlier research,
our classifier cuts the overall error in half. The variables that led to this
increase in outcomes include the addition of significant characteristics like the
amount of characters in messages, the addition of specific thresholds for the
length, and analysis of learning curves and misclassified data.
The capability of Naive Bayes to handle an exceptionally high number of
features is one of its key benefits over other classification methods. Since there
are hundreds of distinct words, they are all considered as features in our
situation.
Conclusion

spam_msg_detection.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to spam_msg_detection.pdf

Similar to spam_msg_detection.pdf (20)

Recently uploaded

Recently uploaded (20)

spam_msg_detection.pdf