Spam detection using machine learning based binary classifier_043660

FACULTY OF INFORMATIC AND COMPUTING
FINAL YEAR PROJECT 1
CSF 35104

INTRODUCTION
• Spam is the biggest contributor to ransomware and malware
• Spam detection help to detect whether a message was spam or ham
• Text classification used to clean the raw data by choosing any preferred
filter
• Training data based on need
• Calculate the accuracy of classifier, weight of data

Spam is waste of
time, storage space
and communication
bandwidth.
It is hard to manually
compare the accuracy
of classified data.
Rules in previous
technique m u s t b e
constantly updated and
maintained.
PROBLEM STATEMENT

3. To leverage modified
machine learning algorithm
in knowledge analysis
software
2. To modify machine
learning algorithm in
computer system settings
4. To test the machine
learning algorithm real data
from machine learning data
repository
1. To study on how to use
machine learning for spam
detection
OBJECTIVES

Use Azure machine
learning studio
Download real data
from machine
learning data
repository
Preprocess the data
set
-formatting
-uploading
-normalizing
Model training
-choosing classifier
-scoring classifier
Creation and setting
up web service
Score of data to
determine the accuracy
of spam detection

Raw
data
Apply
prepocessin
g data
Formatted
data
Formatting
data into .csv
Vowpal Wabbit
algorithm
Prepared
data
Data preprocessing
modules
Features
extraction
Model
training
Web creation
Choosing classifier
Scoring classifier
Evaluated
model
Training into
ML
application
Iterate until data ready Iterate until data ready

Open Azure machine learning studio

Format data files into training and testing data set .csv

Upload the formatted data into datasets

Used the uploaded data set to start preprocess
data

The visual of data before preprocess

Set the parameter to messages and launch column selector

Use select column in dataset to show only classification type
and preprocessed message

We will see that the raw message column is no
longer present

Once the data ready, it can be used to start model training

METHOD :
Binary classifier
TECHNIQUE :
 Text classification
ALGORITHM :
Vowpal Wabbit algorithm
o Fast learning algorithm
o convert data into a vector of features

Klein, S. (2017). Azure Machine Learning. IoT Solutions in Microsofts Azure IoT Suite,227-
252. doi:10.1007/978-1-4842-2143-3_14
Karthickveerakumar. (2017, July 14). Spam filter. Retrieved from
https://www.kaggle.com/karthickveerakumar/spam-filter
Skotz. (2018, February 12). Skotz/cp-spam. Retrieved from https://github.com/skotz/cp-spam
Text Classification: Step 1 of 5, data preparation. (n.d.). Retrieved from
https://gallery.azure.ai/Experiment/f43e79f47d8a4219bf8613d271ea2c45
Ericlicoding. (n.d.). Machine Learning - Initialize Model - Classification - Azure Machine
Learning Studio. Retrieved from https://docs.microsoft.com/en-us/azure/machine-
learning/studio-module-reference/machine-learning-initialize-model-classification
N. G. (2016, July 18). Text Classification in Microsoft’s Azure Machine Learning Studio.
Retrieved from https://www.figure-eight.com/text-classification-microsofts-aml-studio/

Spam detection using machine learning based binary classifier_043660

Spam detection using machine learning based binary classifier_043660

More Related Content

What's hot

Similar to Spam detection using machine learning based binary classifier_043660

Recently uploaded

Spam detection using machine learning based binary classifier_043660