Deriving Optimal Deep Learning Models for Image-based Malware Classification
1. Deriving Optimal Deep Learning Models for
Image-based Malware Classification
ACM SAC, April 2022
Rikima Mitsuhashi and Takahiro Shinagawa
The University of Tokyo
2. Cyber-attacks with malware have continued
Automatically classifying is useful
Malware analysis is a major burden for security analysts
Background
Cyber-attack trends and Image-based malware classification
2
α
β
γ
δ
Malware variants
Convolutional neural network (CNN) is popular
for image-based malware classification
Malware images are familiar with CNN
Created from malware programs
simple, versatile, easy to use
ACM SAC, April 2022
➢ complex and sophisticated
➢ many variants
Recent malware
3. Problem
Fine-tuning degree
3
We can use many types of CNN models
Fine-tuning can be performed for each model
a method of transfer learning
Pre-trained CNN models have the knowledge
of natural objects (plants, animals, artifacts, etc. )
However, it is unclear how effective knowledge of natural
objects is for malware image classification
ACM SAC, April 2022
knowledge of natural objects
Training data of malware image
4. Investigate 24 pre-trained models and five levels of fine-tuning
parameters (Totally, 120 models)
To Frozen means to use the knowledge of natural objects
Evaluated on standard dataset
Malimg (Windows malware) and Drebin (Android malware)
Solution
Deriving the optimal combination of model and fine-tuning
4
DenseNet121 model
Frozen all
Frozen none
Frozen 3/4
Frozen 1/2
Frozen 1/4
Xception model
・・・
knowledge of natural objects
Training data of Malware image
VGG19 model
ACM SAC, April 2022
5. 0.95
0.96
0.97
0.98
0.99
1
Frozen_all Frozen_3/4 Frozen_1/2 Frozen_1/4 Frozen_none
EfficientNetB4
EfficientNetB4 on cross-validation
Evaluation (1/2)
Classification of Malimg and Drebin dataset
5
98.96%
Comparison of cross-validation
0.8
0.85
0.9
0.95
Frozen_all Frozen_3/4 Frozen_1/2 Frozen_1/4 Frozen_none
EfficientNetB4
91.03%
Comparison of hold-out validation
1 The accuracy is one of 10 tested in cross-validation
1
ACM SAC, April 2022
Frozen_none
Frozen_1/4
6. Evaluation (2/2)
Confusion matrix
6
Confusion matrix of Malimg Confusion matrix of Drebin
Summary
Derived optimal deep learning models for Image–based malware classification
EfficientNetB4 with none or only 1/4 of natural object knowledge
Highest classification accuracy
For image-based malware classification
Malimg (98.96%) and Drebin (91.03%) datasets
ACM SAC, April 2022