Emailphishing(deep anti phishnet applying deep neural networks for phishing email detection)

Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
DeepAnti-PhishNet: Applying Deep Neural Networks for
Phishing Email Detection
In this paper author is evaluating performance of fourdeep learning algorithms such
as convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory
(LSTM) and multi-layer perceptron (MLP) to detect phishing email. Both word embedding and Neural
Bag-of-ngrams facilitates to extract syntacticand semantic similarity of emails..
Phishing represents a genuine risk to the In- ternet economy. Email has turned out to be a necessary
verbal exchange tool in contempo- rary lifestyles. In recent days, email remains as the foremost
generally utilized medium to dispatch phishing attacks. As a result, detection of phishing emails has
been considered as an important task in the field of Cybersecurity.
Phishing mails are type of spam mail which are hazardous to users. A phishing mail can steal our
data without our knowledge once its opened. Thus identifying phishing mails from spam mails is very
important. One way to protect our data from phishing mail is to add a secondary password to log in
credentials. Another way is to alarm the user once a Phishing mail tries to steal our data. During the
infant stages of email communication,]clear rules was followed [SHP08], but recently due to the
diversity of email programs and formatting standards we have the freedom to edit and change quoted
text. Despite with these limitations, Symantec Brightmail Sanz [SHP08] has been showing good
performance even now for detection of phishing emails. Moreover, it has the capability to keep track
of IP (internet protocol) addresses of that sent phishing mail. The performance was comparable to
[MW04]. Email services like Microsoft Outlook, Mozilla Thunderbird, or even online email
communication such as Gmail, usually group emails into conversations and attempt to hide quoted
parts in order to improve the readability.
In this task we propose a machine learning based approach to extract the underlying structure in
email text to overcome problems of error-prone rule-based approaches. This will enable the
downstream tasks to work with much cleaner data and additional information by focusing on particular
parts. Also further we show the performance improvements and ﬂexibility over the previous work on
similar tasks.
Email Phishing will be trained with machine learning algorithms and then generate
train model, whenever new email arrived then this model applied on new request to
determine whether it contains SPAM or HAM. In this paper we are evaluating
performance of Four machine learning algorithms such as LTSM, MLP CNN and
ANN and through experiment we conclude that in terms of accuracy.
In this paper author is evaluating performance of SVM and ANN.
In this algorithms author has applied Correlation Based and Chi-Square Based feature
selection algorithms to reduce dataset size, this feature selection algorithms removed
irrelevant data from dataset and then used model with important features, due to this

Mobile:+91 9966499110
features selection algorithms dataset size will reduce and accuracy of prediction will
increase.
To conduct experiment author has used Email Dataset and below is some example
records of that dataset which contains request signatures. I have also used same
dataset and this dataset is available inside ‘dataset’ folder.
Dataset example
[Message,Category]
All above comma separated names in bold format are the names of request signature
ham,"Go until jurong point, crazy.. Available only in bugis n great
world la e buffet... Cine there got amore wat..."
ham,Ok lar... Joking wif u oni...
Above two records are the values and first value contains class label ham or spam
In above dataset records we can see some values are in string format such as tcp,
ftp_data and these values are not important for prediction and these values will be
remove out by applying PREPROCESSING Concept. All attack names will not be
identified by algorithm if it’s given in string format so we need to assign numeric
value for each attack. All this will be done in PREPROCESS steps and then new file
will be generated called ‘clean.txt’ which will use to generate training model.
In below line i am assigning numeric id to each attack
"ham":0,"spam":1
In above lines we can see ham is having id 0 and spam has id 1 and goes on for all
attacks.
Before running code execute below two commands
Screen shots
Double click on ‘run.bat’ file to get below screen

Mobile:+91 9966499110
In above screen click on ‘Upload Email SPAM Dataset’ button and upload dataset
In below screen show Dataset is uploaded

Mobile:+91 9966499110
Now click on ‘Pre-process Dataset’ button to clean dataset to check for missing values
and nan values, tokenization, word2vec
Now click on ‘Generate Training Model’ to split train and test data to generate model
for prediction

Mobile:+91 9966499110
In above screen we can see dataset contains total 5572 records and 4457 used for
training and 1115 used for testing. Now click on ‘Run ANN Algorithm’ to generate
ANN model and calculate its model accuracy
In above screen we can see with ANN we got 97.66% accuracy for text data, now
click on ‘Run LTSM Algorithm’ to calculate LTSM accuracy.

Mobile:+91 9966499110
In above screen we got 98.11% accuracy.Now click on ‘Run CNN Algorithm’ to
generate CNN model and calculate its model accuracy.
In above screen we got 86.13% accuracy.Now click on ‘Run MLP Algorithm’ to
generate MLP model and calculate its model accuracy.

Mobile:+91 9966499110
In above screen we got 86.13% accuracy for MLP algorithm .Now we will click on
‘Graph’ button to upload test data and to predict whether test data is normal or
contains attack. All test data has no class either 0 or 1 and application will predict and
give us result. See below some records from test data
Now click on ‘Accuracy Graph’ button to see accuracy comparison in graph format

Mobile:+91 9966499110
From above graph we can see ANN and LTSM had got better accuracy compare to
CNN and MLP algorithms, in above graph x-axis contains algorithm name and y-axis
represents accuracy of that algorithms

Emailphishing(deep anti phishnet applying deep neural networks for phishing email detection)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Emailphishing(deep anti phishnet applying deep neural networks for phishing email detection)

Similar to Emailphishing(deep anti phishnet applying deep neural networks for phishing email detection) (20)

More from Venkat Projects

More from Venkat Projects (20)

Recently uploaded

Recently uploaded (20)

Emailphishing(deep anti phishnet applying deep neural networks for phishing email detection)