FACULTY OF INFORMATIC AND COMPUTING
FINAL YEAR PROJECT 1
CSF 35104
INTRODUCTION
• Spam is the biggest contributor to ransomware and malware
• Spam detection help to detect whether a message was spam or ham
• Text classification used to clean the raw data by choosing any preferred
filter
• Training data based on need
• Calculate the accuracy of classifier, weight of data
Spam is waste of
time, storage space
and communication
bandwidth.
It is hard to manually
compare the accuracy
of classified data.
Rules in previous
technique m u s t b e
constantly updated and
maintained.
PROBLEM STATEMENT
3. To leverage modified
machine learning algorithm
in knowledge analysis
software
2. To modify machine
learning algorithm in
computer system settings
4. To test the machine
learning algorithm real data
from machine learning data
repository
1. To study on how to use
machine learning for spam
detection
OBJECTIVES
Use Azure machine
learning studio
Download real data
from machine
learning data
repository
Preprocess the data
set
-formatting
-uploading
-normalizing
Model training
-choosing classifier
-scoring classifier
Creation and setting
up web service
Score of data to
determine the accuracy
of spam detection
Raw
data
Apply
prepocessin
g data
Formatted
data
Formatting
data into .csv
Vowpal Wabbit
algorithm
Prepared
data
Data preprocessing
modules
Features
extraction
Model
training
Web creation
Choosing classifier
Scoring classifier
Evaluated
model
Training into
ML
application
Iterate until data ready Iterate until data ready
Open Azure machine learning studio
Format data files into training and testing data set .csv
Upload the formatted data into datasets
Used the uploaded data set to start preprocess
data
The visual of data before preprocess
Set the parameter to messages and launch column selector
Preprocessed dataset result
Use select column in dataset to show only classification type
and preprocessed message
We will see that the raw message column is no
longer present
Once the data ready, it can be used to start model training
METHOD :
Binary classifier
TECHNIQUE :
 Text classification
ALGORITHM :
Vowpal Wabbit algorithm
o Fast learning algorithm
o convert data into a vector of features
Klein, S. (2017). Azure Machine Learning. IoT Solutions in Microsofts Azure IoT Suite,227-
252. doi:10.1007/978-1-4842-2143-3_14
Karthickveerakumar. (2017, July 14). Spam filter. Retrieved from
https://www.kaggle.com/karthickveerakumar/spam-filter
Skotz. (2018, February 12). Skotz/cp-spam. Retrieved from https://github.com/skotz/cp-spam
Text Classification: Step 1 of 5, data preparation. (n.d.). Retrieved from
https://gallery.azure.ai/Experiment/f43e79f47d8a4219bf8613d271ea2c45
Ericlicoding. (n.d.). Machine Learning - Initialize Model - Classification - Azure Machine
Learning Studio. Retrieved from https://docs.microsoft.com/en-us/azure/machine-
learning/studio-module-reference/machine-learning-initialize-model-classification
N. G. (2016, July 18). Text Classification in Microsoft’s Azure Machine Learning Studio.
Retrieved from https://www.figure-eight.com/text-classification-microsofts-aml-studio/
Spam detection using machine learning based binary classifier_043660

Spam detection using machine learning based binary classifier_043660

  • 1.
    FACULTY OF INFORMATICAND COMPUTING FINAL YEAR PROJECT 1 CSF 35104
  • 3.
    INTRODUCTION • Spam isthe biggest contributor to ransomware and malware • Spam detection help to detect whether a message was spam or ham • Text classification used to clean the raw data by choosing any preferred filter • Training data based on need • Calculate the accuracy of classifier, weight of data
  • 4.
    Spam is wasteof time, storage space and communication bandwidth. It is hard to manually compare the accuracy of classified data. Rules in previous technique m u s t b e constantly updated and maintained. PROBLEM STATEMENT
  • 5.
    3. To leveragemodified machine learning algorithm in knowledge analysis software 2. To modify machine learning algorithm in computer system settings 4. To test the machine learning algorithm real data from machine learning data repository 1. To study on how to use machine learning for spam detection OBJECTIVES
  • 7.
    Use Azure machine learningstudio Download real data from machine learning data repository Preprocess the data set -formatting -uploading -normalizing Model training -choosing classifier -scoring classifier Creation and setting up web service Score of data to determine the accuracy of spam detection
  • 9.
    Raw data Apply prepocessin g data Formatted data Formatting data into.csv Vowpal Wabbit algorithm Prepared data Data preprocessing modules Features extraction Model training Web creation Choosing classifier Scoring classifier Evaluated model Training into ML application Iterate until data ready Iterate until data ready
  • 11.
    Open Azure machinelearning studio
  • 12.
    Format data filesinto training and testing data set .csv
  • 13.
    Upload the formatteddata into datasets
  • 14.
    Used the uploadeddata set to start preprocess data
  • 15.
    The visual ofdata before preprocess
  • 16.
    Set the parameterto messages and launch column selector
  • 17.
  • 18.
    Use select columnin dataset to show only classification type and preprocessed message
  • 19.
    We will seethat the raw message column is no longer present
  • 20.
    Once the dataready, it can be used to start model training
  • 22.
    METHOD : Binary classifier TECHNIQUE:  Text classification ALGORITHM : Vowpal Wabbit algorithm o Fast learning algorithm o convert data into a vector of features
  • 27.
    Klein, S. (2017).Azure Machine Learning. IoT Solutions in Microsofts Azure IoT Suite,227- 252. doi:10.1007/978-1-4842-2143-3_14 Karthickveerakumar. (2017, July 14). Spam filter. Retrieved from https://www.kaggle.com/karthickveerakumar/spam-filter Skotz. (2018, February 12). Skotz/cp-spam. Retrieved from https://github.com/skotz/cp-spam Text Classification: Step 1 of 5, data preparation. (n.d.). Retrieved from https://gallery.azure.ai/Experiment/f43e79f47d8a4219bf8613d271ea2c45 Ericlicoding. (n.d.). Machine Learning - Initialize Model - Classification - Azure Machine Learning Studio. Retrieved from https://docs.microsoft.com/en-us/azure/machine- learning/studio-module-reference/machine-learning-initialize-model-classification N. G. (2016, July 18). Text Classification in Microsoft’s Azure Machine Learning Studio. Retrieved from https://www.figure-eight.com/text-classification-microsofts-aml-studio/