SlideShare a Scribd company logo
1 of 14
Download to read offline
Deep Learning based Frameworks for Handling
Imbalance in DGA, Email, and URL Data Analysis
Simran K1, Prathiksha Balakrishna2, Vinayakumar R3, 1, and Soman
KP1
1Center for Computational Engineering and Networking, Amrita School Of Engineering,
Amrita Vishwa Vidyapeetham, Coimbatore, India. simiketha19@gmail.com
2Graduate School, Computer Science Department, Texas State University.
prathi.93april8@gmail.com
3Division of Biomedical Informatics, Cincinnati Children’s Hospital Centre, Cincinnati, OH,
United States. Vinayakumar.Ravi@cchmc.org
December 24, 2019
1 / 14
Overview
1 Motivation
2 Literature survey
3 Description of data set
4 Proposed Architecture
5 Experimentation
6 Results
7 Conclusion and Future work
8 References
2 / 14
Motivation
Cyber Security is an area which deals with techniques related to
protecting data, programs, devices, and networks from any attack,
damage, and unauthorized access.
Deep learning architectures are employed in Cyber Security use cases
and have achieved good accuracy.
However, the real-time data sets are highly imbalanced in nature and
in order to handle it properly the concept of data mining approaches
can be used.
This work proposes a cost-sensitive deep learning based framework for
Cyber Security.
3 / 14
Literature survey
Various machine learning and deep learning models were proposed to
classify DGA, URL, and Email [4], [5], [6].
In [3], an LSTM based model is proposed to handle multi-class
imbalanced data for detection of DGA botnet.
A cost-sensitive framework first filter was proposed in [7] to detect
malicious URL.
4 / 14
Description of data set
In this work, three different data sets were utilized for the three use
cases.
All the data sets are unique as well as the train and test data sets are
disjoint to each other.
Table: Data set details
Domain Generation Algorithms (DGA)
Legitimate DGA generated (malicious)
train 38,276 53,052
test 12,753 17,690
Uniform Resource Locator (URL)
Legitimate Malicious
train 23,374 11,116
test 1,142 578
Electronic mail (Email)
Legitimate Spam
train 19,337 24,665
test 8,153 10,706
5 / 14
Proposed Architecture
Figure: Cost-insensitive and cost-sensitive deep learning based architectures.
6 / 14
Experimentation
All the classical machine learning algorithms are implemented using
Scikit-learn1 and the deep learning models are implemented using
TensorFlow2 with Keras3 framework.
To convert text into numerical values, Keras embedding was used
with an embedding dimension of 128.
Different architectures namely, DNN, CNN, LSTM, and CNN-LSTM
are used.
1
https://scikit-learn.org/
2
https://www.tensorflow.org/
3
https://keras.io/
7 / 14
Experimentation
For DNN, 4 hidden layers with units 512, 384, 256, 128 and a finally
dense layer with 1 hidden unit with a dropout of 0.1 is used.
In CNN, 64 filters with filter length 3 and followed by max pooling
with pooling length 2 are used. Followed by a dense layer with 128
hidden units and dropout of 0.3. Finally a dense layer with one
hidden unit is used.
LSTM contains 128 memory blocks followed by dropout of 0.3 and
finally dense layer with one hidden unit.
For CNN-LSTM, we connected CNN with LSTM. CNN has 64 filters
with filter length 3, followed by max pooling with pooling length 2.
Followed by LSTM with 50 memory blocks and finally a dense layer
with 1 hidden unit.
8 / 14
Results
Table: Results for DGA Analysis.
Model Accuracy Precision Recall F1-score TN FP FN TP
Naive Bayes 68.1 99.3 45.5 64.4 12,700 53 9,653 8037
Decision Tree 79.7 76.5 93.8 84.3 7,654 5,099 1091 16,599
AdaBoost 82.8 79.2 95.6 86.6 8,300 4,453 770 16,920
Random Forest 84.1 80.5 95.8 87.5 8,651 4,102 736 16,954
Support vector machine 85.2 81.7 96.2 88.3 8,932 3,821 680 17,010
Deep neural networks 86.8 83.7 96.1 89.5 9,434 3,319 688 17,002
CNN 94.3 92.1 98.7 95.3 11,263 1,490 233 17,457
LSTM 94.4 93.0 97.6 95.3 11,457 1,296 421 17,269
CNN-LSTM 95.2 93.2 99.0 96.0 11,478 1,275 174 15,716
Cost-sensitive models
CNN 95.4 93.2 99.5 96.2 11,464 1,289 97 17,593
LSTM 95.5 93.2 99.6 96.3 11,470 1,283 74 17,616
CNN-LSTM 95.6 93.2 99.7 96.3 11,467 1,286 59 17,631
9 / 14
Results
Table: Results for Email Analysis.
Model Accuracy Precision Recall F1-score TN FP FN TP
Naive Bayes 68.8 99.4 45.3 62.2 8,122 31 5,855 4,851
Decision Tree 82.9 80.0 95.4 87.1 4,521 2,527 487 10,138
AdaBoost 91.3 88.6 97.1 92.7 6,815 1,338 310 10,396
Random Forest 92.0 89.9 96.7 93.2 6,984 1,169 349 10,357
Support vector machine 92.3 92.3 94.4 93.3 7,304 849 599 10,107
Deep neural networks 93.0 90.0 98.6 94.1 6,980 1,173 145 10,561
CNN 93.6 92.6 96.4 94.5 7,326 827 382 10,324
LSTM 93.7 91.9 97.5 94.6 7,239 914 270 10,436
CNN-LSTM 94.0 92.2 97.6 94.8 7,270 883 253 10,453
Cost-sensitive models
CNN 94.2 92.7 97.4 95.0 7,334 819 276 10,430
LSTM 94.3 92.7 97.6 95.1 7,333 820 254 10,452
CNN-LSTM 94.7 92.8 98.3 95.5 7,341 812 187 10,519
10 / 14
Results
Table: Results for URL Analysis.
Model Accuracy Precision Recall F1-score TN FP FN TP
Naive Bayes 45.1 37.9 98.8 54.7 205 937 7 571
Decision Tree 81.8 73.3 72.1 72.7 990 152 161 417
AdaBoost 87.1 83.8 76.3 79.9 1,057 85 137 441
Random Forest 90.0 90.6 78.4 84.0 1,095 47 125 453
Support vector machine 81.0 88.4 50.0 63.9 1,104 38 289 289
Deep neural networks 90.8 92.5 79.1 85.3 1,105 37 121 457
CNN 92.9 91.9 86.5 89.1 1,098 44 78 500
LSTM 93.4 97.2 82.7 89.3 1,128 14 100 478
CNN-LSTM 94.4 96.5 86.5 91.2 1,124 18 78 500
Cost-sensitive models
CNN 93.4 96.0 83.7 89.5 1,122 20 94 484
LSTM 94.5 93.7 89.8 91.7 1,107 35 59 519
CNN-LSTM 94.7 93.1 91.0 92.0 1,103 39 52 526
11 / 14
Conclusion and Future work
This paper proposes a generalized cost-sensitive deep learning model
for Cyber Security use cases such as DGA, Email, and URL.
The performance of deep learning models is better than the machine
learning models.
More importantly, the performance of cost-sensitive deep learning
models is better than the cost-insensitive models.
The cost-sensitive hybrid model composed of CNN and LSTM can
extract spatial and temporal features and can obtain better perform
on any type of data sets.
Implementing this model in real time data analysis with Big data and
Streaming can be considered a good direction for future work.
12 / 14
References I
Berman, D. S., Buczak, A. L., Chavis, J. S., & Corbett, C. L. (2019). A survey of
deep learning methods for cyber security. Information, 10(4), 122.
Mahdavifar, S., & Ghorbani, A. A. (2019). Application of deep learning to
cybersecurity: A survey. Neurocomputing, 347, 149-176.
Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2018). A LSTM based
framework for handling multiclass imbalance in DGA botnet detection.
Neurocomputing, 275, 2401-2413.
Yu, B., Gray, D. L., Pan, J., De Cock, M., & Nascimento, A. C. (2017, November).
Inline DGA detection with deep networks. In 2017 IEEE International Conference
on Data Mining Workshops (ICDMW) (pp. 683-692). IEEE.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Evaluating deep
learning approaches to characterize and classify malicious URL’s. Journal of
Intelligent & Fuzzy Systems, 34(3), 1333-1343.
13 / 14
References II
Alurkar, A. A., Ranade, S. B., Joshi, S. V., Ranade, S. S., Sonewar, P. A., Mahalle,
P. N., & Deshpande, A. V. (2017, November). A proposed data science approach
for email spam classification using machine learning techniques. In 2017 Internet of
Things Business Models, Users, and Networks (pp. 1-5). IEEE.
Vu, L., Nguyen, P., & Turaga, D. (2016). Firstfilter: a cost-sensitive approach to
malicious URL detection in large-scale enterprise networks. IBM Journal of
Research and Development, 60(4), 4-1.
14 / 14

More Related Content

What's hot

LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...IJNSA Journal
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...IDES Editor
 
Improved target recognition response using collaborative brain-computer inter...
Improved target recognition response using collaborative brain-computer inter...Improved target recognition response using collaborative brain-computer inter...
Improved target recognition response using collaborative brain-computer inter...Kyongsik Yun
 
A Survey on Different Data Hiding Techniques in Encrypted Images
A Survey on Different Data Hiding Techniques in Encrypted ImagesA Survey on Different Data Hiding Techniques in Encrypted Images
A Survey on Different Data Hiding Techniques in Encrypted Imagesijsrd.com
 
An improved color image encryption algorithm with
An improved color image encryption algorithm withAn improved color image encryption algorithm with
An improved color image encryption algorithm witheSAT Publishing House
 
An improved color image encryption algorithm with pixel permutation and bit s...
An improved color image encryption algorithm with pixel permutation and bit s...An improved color image encryption algorithm with pixel permutation and bit s...
An improved color image encryption algorithm with pixel permutation and bit s...eSAT Journals
 
Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010girivaishali
 
Neural network image recognition
Neural network image recognitionNeural network image recognition
Neural network image recognitionOleksii Sekundant
 
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueA Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueIJERA Editor
 
EEE400 1st Trimester Progress Presentation on Sleep Disorder Classification
EEE400 1st Trimester Progress Presentation on Sleep Disorder ClassificationEEE400 1st Trimester Progress Presentation on Sleep Disorder Classification
EEE400 1st Trimester Progress Presentation on Sleep Disorder ClassificationMd Kafiul Islam
 
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...IJCSIS Research Publications
 
Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...prj_publication
 
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...IJNSA Journal
 
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...IJECEIAES
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET Journal
 
CI image processing mns
CI image processing mnsCI image processing mns
CI image processing mnsMeenakshi Sood
 
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...sonix022
 

What's hot (20)

2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
 
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
 
Improved target recognition response using collaborative brain-computer inter...
Improved target recognition response using collaborative brain-computer inter...Improved target recognition response using collaborative brain-computer inter...
Improved target recognition response using collaborative brain-computer inter...
 
A Survey on Different Data Hiding Techniques in Encrypted Images
A Survey on Different Data Hiding Techniques in Encrypted ImagesA Survey on Different Data Hiding Techniques in Encrypted Images
A Survey on Different Data Hiding Techniques in Encrypted Images
 
An improved color image encryption algorithm with
An improved color image encryption algorithm withAn improved color image encryption algorithm with
An improved color image encryption algorithm with
 
An improved color image encryption algorithm with pixel permutation and bit s...
An improved color image encryption algorithm with pixel permutation and bit s...An improved color image encryption algorithm with pixel permutation and bit s...
An improved color image encryption algorithm with pixel permutation and bit s...
 
Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010
 
Neural network image recognition
Neural network image recognitionNeural network image recognition
Neural network image recognition
 
323462348
323462348323462348
323462348
 
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR ValueA Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
A Secure & Optimized Data Hiding Technique Using DWT With PSNR Value
 
EEE400 1st Trimester Progress Presentation on Sleep Disorder Classification
EEE400 1st Trimester Progress Presentation on Sleep Disorder ClassificationEEE400 1st Trimester Progress Presentation on Sleep Disorder Classification
EEE400 1st Trimester Progress Presentation on Sleep Disorder Classification
 
F010524057
F010524057F010524057
F010524057
 
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
A SECURE STEGANOGRAPHY APPROACH FOR CLOUD DATA USING ANN ALONG WITH PRIVATE K...
 
Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...
 
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...
MAINTAINING UNIFORM DENSITY AND MINIMIZING THE CHANCE OF ERROR IN A LARGE SCA...
 
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
 
CI image processing mns
CI image processing mnsCI image processing mns
CI image processing mns
 
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
 

Similar to Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis

Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter StreamDeep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter StreamSimranKetha
 
Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...CSITiaesprime
 
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...SimranKetha
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG Design
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG DesignAn Efficient VLSI Design of AES Cryptography Based on DNA TRNG Design
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG DesignIRJET Journal
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi
 
NON IEEE PROJECT TITLES FOR JAVA & DOTNET
NON IEEE PROJECT TITLES FOR JAVA & DOTNETNON IEEE PROJECT TITLES FOR JAVA & DOTNET
NON IEEE PROJECT TITLES FOR JAVA & DOTNETtema_solution
 
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROID
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROIDNON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROID
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROIDtema_solution
 
2013 NEW PROJECT TITLES LIST FOR JAVA AND DOTNET
2013 NEW PROJECT TITLES LIST FOR  JAVA AND DOTNET2013 NEW PROJECT TITLES LIST FOR  JAVA AND DOTNET
2013 NEW PROJECT TITLES LIST FOR JAVA AND DOTNETtema_solution
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash ResumeRam Prakash
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkWaqas Tariq
 
Enhanced transformer long short-term memory framework for datastream prediction
Enhanced transformer long short-term memory framework for datastream predictionEnhanced transformer long short-term memory framework for datastream prediction
Enhanced transformer long short-term memory framework for datastream predictionIJECEIAES
 
Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIJERA Editor
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...Francesco Paolo Caforio
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...IJECEIAES
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...BASMAJUMAASALEHALMOH
 
Robust Malware Detection using Residual Attention Network
Robust Malware Detection using Residual Attention NetworkRobust Malware Detection using Residual Attention Network
Robust Malware Detection using Residual Attention NetworkShamika Ganesan
 

Similar to Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis (20)

Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter StreamDeep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream
 
Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...
 
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Secu...
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG Design
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG DesignAn Efficient VLSI Design of AES Cryptography Based on DNA TRNG Design
An Efficient VLSI Design of AES Cryptography Based on DNA TRNG Design
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Java & dotnet
Java & dotnetJava & dotnet
Java & dotnet
 
NON IEEE PROJECT TITLES FOR JAVA & DOTNET
NON IEEE PROJECT TITLES FOR JAVA & DOTNETNON IEEE PROJECT TITLES FOR JAVA & DOTNET
NON IEEE PROJECT TITLES FOR JAVA & DOTNET
 
Java & dotnet
Java & dotnetJava & dotnet
Java & dotnet
 
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROID
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROIDNON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROID
NON IEEE REAL TIME PROJECT FOR JAVA,DOTNET,ANDROID
 
2013 NEW PROJECT TITLES LIST FOR JAVA AND DOTNET
2013 NEW PROJECT TITLES LIST FOR  JAVA AND DOTNET2013 NEW PROJECT TITLES LIST FOR  JAVA AND DOTNET
2013 NEW PROJECT TITLES LIST FOR JAVA AND DOTNET
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash Resume
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural Network
 
Enhanced transformer long short-term memory framework for datastream prediction
Enhanced transformer long short-term memory framework for datastream predictionEnhanced transformer long short-term memory framework for datastream prediction
Enhanced transformer long short-term memory framework for datastream prediction
 
Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural Network
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...
Leveraging Grad-CAM to Improve the Accuracy of Network Intrusion Detection Sy...
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
 
Robust Malware Detection using Residual Attention Network
Robust Malware Detection using Residual Attention NetworkRobust Malware Detection using Residual Attention Network
Robust Malware Detection using Residual Attention Network
 

Recently uploaded

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis

  • 1. Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis Simran K1, Prathiksha Balakrishna2, Vinayakumar R3, 1, and Soman KP1 1Center for Computational Engineering and Networking, Amrita School Of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India. simiketha19@gmail.com 2Graduate School, Computer Science Department, Texas State University. prathi.93april8@gmail.com 3Division of Biomedical Informatics, Cincinnati Children’s Hospital Centre, Cincinnati, OH, United States. Vinayakumar.Ravi@cchmc.org December 24, 2019 1 / 14
  • 2. Overview 1 Motivation 2 Literature survey 3 Description of data set 4 Proposed Architecture 5 Experimentation 6 Results 7 Conclusion and Future work 8 References 2 / 14
  • 3. Motivation Cyber Security is an area which deals with techniques related to protecting data, programs, devices, and networks from any attack, damage, and unauthorized access. Deep learning architectures are employed in Cyber Security use cases and have achieved good accuracy. However, the real-time data sets are highly imbalanced in nature and in order to handle it properly the concept of data mining approaches can be used. This work proposes a cost-sensitive deep learning based framework for Cyber Security. 3 / 14
  • 4. Literature survey Various machine learning and deep learning models were proposed to classify DGA, URL, and Email [4], [5], [6]. In [3], an LSTM based model is proposed to handle multi-class imbalanced data for detection of DGA botnet. A cost-sensitive framework first filter was proposed in [7] to detect malicious URL. 4 / 14
  • 5. Description of data set In this work, three different data sets were utilized for the three use cases. All the data sets are unique as well as the train and test data sets are disjoint to each other. Table: Data set details Domain Generation Algorithms (DGA) Legitimate DGA generated (malicious) train 38,276 53,052 test 12,753 17,690 Uniform Resource Locator (URL) Legitimate Malicious train 23,374 11,116 test 1,142 578 Electronic mail (Email) Legitimate Spam train 19,337 24,665 test 8,153 10,706 5 / 14
  • 6. Proposed Architecture Figure: Cost-insensitive and cost-sensitive deep learning based architectures. 6 / 14
  • 7. Experimentation All the classical machine learning algorithms are implemented using Scikit-learn1 and the deep learning models are implemented using TensorFlow2 with Keras3 framework. To convert text into numerical values, Keras embedding was used with an embedding dimension of 128. Different architectures namely, DNN, CNN, LSTM, and CNN-LSTM are used. 1 https://scikit-learn.org/ 2 https://www.tensorflow.org/ 3 https://keras.io/ 7 / 14
  • 8. Experimentation For DNN, 4 hidden layers with units 512, 384, 256, 128 and a finally dense layer with 1 hidden unit with a dropout of 0.1 is used. In CNN, 64 filters with filter length 3 and followed by max pooling with pooling length 2 are used. Followed by a dense layer with 128 hidden units and dropout of 0.3. Finally a dense layer with one hidden unit is used. LSTM contains 128 memory blocks followed by dropout of 0.3 and finally dense layer with one hidden unit. For CNN-LSTM, we connected CNN with LSTM. CNN has 64 filters with filter length 3, followed by max pooling with pooling length 2. Followed by LSTM with 50 memory blocks and finally a dense layer with 1 hidden unit. 8 / 14
  • 9. Results Table: Results for DGA Analysis. Model Accuracy Precision Recall F1-score TN FP FN TP Naive Bayes 68.1 99.3 45.5 64.4 12,700 53 9,653 8037 Decision Tree 79.7 76.5 93.8 84.3 7,654 5,099 1091 16,599 AdaBoost 82.8 79.2 95.6 86.6 8,300 4,453 770 16,920 Random Forest 84.1 80.5 95.8 87.5 8,651 4,102 736 16,954 Support vector machine 85.2 81.7 96.2 88.3 8,932 3,821 680 17,010 Deep neural networks 86.8 83.7 96.1 89.5 9,434 3,319 688 17,002 CNN 94.3 92.1 98.7 95.3 11,263 1,490 233 17,457 LSTM 94.4 93.0 97.6 95.3 11,457 1,296 421 17,269 CNN-LSTM 95.2 93.2 99.0 96.0 11,478 1,275 174 15,716 Cost-sensitive models CNN 95.4 93.2 99.5 96.2 11,464 1,289 97 17,593 LSTM 95.5 93.2 99.6 96.3 11,470 1,283 74 17,616 CNN-LSTM 95.6 93.2 99.7 96.3 11,467 1,286 59 17,631 9 / 14
  • 10. Results Table: Results for Email Analysis. Model Accuracy Precision Recall F1-score TN FP FN TP Naive Bayes 68.8 99.4 45.3 62.2 8,122 31 5,855 4,851 Decision Tree 82.9 80.0 95.4 87.1 4,521 2,527 487 10,138 AdaBoost 91.3 88.6 97.1 92.7 6,815 1,338 310 10,396 Random Forest 92.0 89.9 96.7 93.2 6,984 1,169 349 10,357 Support vector machine 92.3 92.3 94.4 93.3 7,304 849 599 10,107 Deep neural networks 93.0 90.0 98.6 94.1 6,980 1,173 145 10,561 CNN 93.6 92.6 96.4 94.5 7,326 827 382 10,324 LSTM 93.7 91.9 97.5 94.6 7,239 914 270 10,436 CNN-LSTM 94.0 92.2 97.6 94.8 7,270 883 253 10,453 Cost-sensitive models CNN 94.2 92.7 97.4 95.0 7,334 819 276 10,430 LSTM 94.3 92.7 97.6 95.1 7,333 820 254 10,452 CNN-LSTM 94.7 92.8 98.3 95.5 7,341 812 187 10,519 10 / 14
  • 11. Results Table: Results for URL Analysis. Model Accuracy Precision Recall F1-score TN FP FN TP Naive Bayes 45.1 37.9 98.8 54.7 205 937 7 571 Decision Tree 81.8 73.3 72.1 72.7 990 152 161 417 AdaBoost 87.1 83.8 76.3 79.9 1,057 85 137 441 Random Forest 90.0 90.6 78.4 84.0 1,095 47 125 453 Support vector machine 81.0 88.4 50.0 63.9 1,104 38 289 289 Deep neural networks 90.8 92.5 79.1 85.3 1,105 37 121 457 CNN 92.9 91.9 86.5 89.1 1,098 44 78 500 LSTM 93.4 97.2 82.7 89.3 1,128 14 100 478 CNN-LSTM 94.4 96.5 86.5 91.2 1,124 18 78 500 Cost-sensitive models CNN 93.4 96.0 83.7 89.5 1,122 20 94 484 LSTM 94.5 93.7 89.8 91.7 1,107 35 59 519 CNN-LSTM 94.7 93.1 91.0 92.0 1,103 39 52 526 11 / 14
  • 12. Conclusion and Future work This paper proposes a generalized cost-sensitive deep learning model for Cyber Security use cases such as DGA, Email, and URL. The performance of deep learning models is better than the machine learning models. More importantly, the performance of cost-sensitive deep learning models is better than the cost-insensitive models. The cost-sensitive hybrid model composed of CNN and LSTM can extract spatial and temporal features and can obtain better perform on any type of data sets. Implementing this model in real time data analysis with Big data and Streaming can be considered a good direction for future work. 12 / 14
  • 13. References I Berman, D. S., Buczak, A. L., Chavis, J. S., & Corbett, C. L. (2019). A survey of deep learning methods for cyber security. Information, 10(4), 122. Mahdavifar, S., & Ghorbani, A. A. (2019). Application of deep learning to cybersecurity: A survey. Neurocomputing, 347, 149-176. Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2018). A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 275, 2401-2413. Yu, B., Gray, D. L., Pan, J., De Cock, M., & Nascimento, A. C. (2017, November). Inline DGA detection with deep networks. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 683-692). IEEE. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Evaluating deep learning approaches to characterize and classify malicious URL’s. Journal of Intelligent & Fuzzy Systems, 34(3), 1333-1343. 13 / 14
  • 14. References II Alurkar, A. A., Ranade, S. B., Joshi, S. V., Ranade, S. S., Sonewar, P. A., Mahalle, P. N., & Deshpande, A. V. (2017, November). A proposed data science approach for email spam classification using machine learning techniques. In 2017 Internet of Things Business Models, Users, and Networks (pp. 1-5). IEEE. Vu, L., Nguyen, P., & Turaga, D. (2016). Firstfilter: a cost-sensitive approach to malicious URL detection in large-scale enterprise networks. IBM Journal of Research and Development, 60(4), 4-1. 14 / 14