Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
FFRI, Inc.

Improving accuracy of malware detection by
Fourteenforty Research Institute, Inc.
filtering evaluation dataset...
FFRI, Inc.

Preface
• This slides was used for a presentation at CSS2013
– http://www.iwsec.org/css/2013/english/index.htm...
FFRI, Inc.

Agenda
•
•
•
•
•
•
•
•

Background
Problem
Scope and purpose
Experiment 1
Experiment 2
Experiment 3
Considerat...
FFRI, Inc.

Background – malware and its detection
Malware generators

Obfuscators

Increasing
malware
Targeted Attack
(Un...
FFRI, Inc.

Background – Related works
• Mainly focusing on a combination of the factors below
– Features selection and mo...
FFRI, Inc.

Problem
• General theory of machine learning:
– Accuracy of classification declines
if trends of training and ...
FFRI, Inc.

Scope and purpose
① Investigating differences between similarities of
malware and benign files(Experiment-1)
②...
FFRI, Inc.

Experiment-1(1/3)
• Used FFRI Dataset 2013 and benign files we collected as datasets
• Calculated the similari...
FFRI, Inc.

Experiment-1(2/3)
Grouping malware and benign files based on their similarities

Threshold of similarity (0.0 ...
FFRI, Inc.

Experiment-1(3/3)
It is more difficult to find similar benign files compared to malware
100%
80%

60%
40%
uniq...
FFRI, Inc.

Experiment-2(1/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a tr...
FFRI, Inc.

Experiment-2(2/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a tr...
FFRI, Inc.

Experiment-2(3/3)
The accuracy declines if trends of training and testing data
are different

■TPR

■FPR
0.624...
FFRI, Inc.

Experiment-3(1/6) – After a training
dividing line

malware

benign(train)

benign

malware(train)

benign(tes...
FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

benign(train)

malware(train)

benign(test)

malware...
FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

FP

FN
benign(train)

malware(train)

benign(test)

...
FFRI, Inc.

Experiment-3(3/6) – Low similarity data
dividing line

TP(accidentally)

FN
benign(train)

FN

malware(train)
...
FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

The number of classified data

1400

1.00

1200

0.98

1000

0.96

800
0.94...
FFRI, Inc.

Experiment-3(5/6) – Effect to FPR
The number of classified data

2500

0.014
0.012

2000
0.010
1500

0.008

10...
FFRI, Inc.

The number of classified data
/ The number of total testing data

Experiment-3(6/6)
Transition of the number o...
FFRI, Inc.

Consideration(1/3)
• In real scenario:
– trying to classify an unknown file/process whether it is
benign files...
FFRI, Inc.

Consideration(2/3)
• If malware have many variants as the current
– ML-based detection works well
• Having man...
FFRI, Inc.

Consideration(3/3)
• How to deal with unclassified (filtered) data

1. Using other feature vectors
2. Enlargin...
FFRI, Inc.

Conclusion
• Distribution of similarity for malware and benign are difference
(Experiment-1)
•

Accuracy decli...
Upcoming SlideShare
Loading in …5
×

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

1,020 views

Published on

In recent years, it has been getting more difficult to detect malware by a traditional method like pattern matching because of the improvement of malware. Therefore machine learning-based detection has been introduced and reported that it has achieved a high detection rate compared to a traditional method in various research. However, it is well-known that the accuracy of detection significantly degrades against data that differ from training dataset. This study provides a method to improve accuracy of detection by filtering evaluation dataset based on similarity between evaluation and training dataset.

Published in: Technology, Education
  • Be the first to comment

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

  1. 1. FFRI, Inc. Improving accuracy of malware detection by Fourteenforty Research Institute, Inc. filtering evaluation dataset based on its similarity Junichi Murakami Director of Advanced Development FFRI, Inc. http://www.ffri.jp
  2. 2. FFRI, Inc. Preface • This slides was used for a presentation at CSS2013 – http://www.iwsec.org/css/2013/english/index.html • Please refer the original paper for the detail data – http://www.ffri.jp/assets/files/research/research_papers/M WS2013_paper.pdf (Written in Japanese but the figures are common) • Contact information – research-feedback@ffri.jp – @FFRI_Research (twitter) 2
  3. 3. FFRI, Inc. Agenda • • • • • • • • Background Problem Scope and purpose Experiment 1 Experiment 2 Experiment 3 Consideration Conclusion 3
  4. 4. FFRI, Inc. Background – malware and its detection Malware generators Obfuscators Increasing malware Targeted Attack (Unknown malware) Limitation of signature matching other methods Heuristic Bigdata Machine learning Could reputation 4
  5. 5. FFRI, Inc. Background – Related works • Mainly focusing on a combination of the factors below – Features selection and modification, parameter settings • Some good results are reported (TRP:90%+, FRP:1%-) Features Algorithms Evaluation Static information SVM TPR/FRP, etc. Dynamic information Naive bayes Accuracy, Precision Hybrid Perceptron, etc. ROC-curve, etc. 5
  6. 6. FFRI, Inc. Problem • General theory of machine learning: – Accuracy of classification declines if trends of training and testing data are different • How about malware and benign files ? ? 6
  7. 7. FFRI, Inc. Scope and purpose ① Investigating differences between similarities of malware and benign files(Experiment-1) ② Investigating an effect for accuracy of classification by the difference(Experiment-2) ③ Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3) 7
  8. 8. FFRI, Inc. Experiment-1(1/3) • Used FFRI Dataset 2013 and benign files we collected as datasets • Calculated the similarity of each malware and benign files (Jubatus, MinHash) • Feature vector: A number of 4-gram of sequential API calls – ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times malware benign A A A B C ... B C ... B C ... 0.52 ... A ー 0.8 B ー ー 1.0 ... C ー ー ー ... ... ー ー ー ー 8
  9. 9. FFRI, Inc. Experiment-1(2/3) Grouping malware and benign files based on their similarities Threshold of similarity (0.0 - 1.0) benign malware 9
  10. 10. FFRI, Inc. Experiment-1(3/3) It is more difficult to find similar benign files compared to malware 100% 80% 60% 40% unique 仲間無 not 仲間有 20% 0.8 0.85 0.9 0.95 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 0% unique 1 Threshold of similarity 10
  11. 11. FFRI, Inc. Experiment-2(1/3) • How much does the difference affect a result? • 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW) train classify jubatus jubatus TPR: ? FPR: ? malware TPR: True Positive Rate FPR: False Positive Rate testing train benign 11
  12. 12. FFRI, Inc. Experiment-2(2/3) • How much does the difference affect a result? • 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW) train jubatus malware benign testing train classify jubatus TPR: ? FPR: ? 12
  13. 13. FFRI, Inc. Experiment-2(3/3) The accuracy declines if trends of training and testing data are different ■TPR ■FPR 0.624(not unique) +3.866 97.996(not unique) 81.297(unique) 4.49(unique) -16.699 0 50 100 % 0 1 2 3 4 5 % 13
  14. 14. FFRI, Inc. Experiment-3(1/6) – After a training dividing line malware benign(train) benign malware(train) benign(test) malware(test) 14
  15. 15. FFRI, Inc. Experiment-3(2/6) – After a classification dividing line benign(train) malware(train) benign(test) malware(test) 15
  16. 16. FFRI, Inc. Experiment-3(2/6) – After a classification dividing line FP FN benign(train) malware(train) benign(test) malware(test) 16
  17. 17. FFRI, Inc. Experiment-3(3/6) – Low similarity data dividing line TP(accidentally) FN benign(train) FN malware(train) benign(test) malware(test) 17
  18. 18. FFRI, Inc. Experiment-3(4/6) – Effect to TPR The number of classified data 1400 1.00 1200 0.98 1000 0.96 800 0.94 600 0.92 400 200 0.90 0 TP FN TPR 0.88 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 18
  19. 19. FFRI, Inc. Experiment-3(5/6) – Effect to FPR The number of classified data 2500 0.014 0.012 2000 0.010 1500 0.008 1000 0.006 TN FP FPR 0.004 500 0.002 0 0.000 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 19
  20. 20. FFRI, Inc. The number of classified data / The number of total testing data Experiment-3(6/6) Transition of the number of classified data malware マルウェア benign 正常系ソフトウェア 120% 100% 80% 60% 40% 20% 0% 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 20
  21. 21. FFRI, Inc. Consideration(1/3) • In real scenario: – trying to classify an unknown file/process whether it is benign files or not • If we apply Experiment-3: – Files are classified only if similar data is already trained – If not, files are not classified which results in • FN if the files is malware • TF if the files is benign (All right as a result) • Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable) 21
  22. 22. FFRI, Inc. Consideration(2/3) • If malware have many variants as the current – ML-based detection works well • Having many variants ∝ malware generators/obfuscators • We have to investigate – Trends of usage of the tools above – Possibility of anti-machine learning detection 22
  23. 23. FFRI, Inc. Consideration(3/3) • How to deal with unclassified (filtered) data 1. Using other feature vectors 2. Enlarging a training dataset (Unique → Not unique) 3. Using other methods besides ML 23
  24. 24. FFRI, Inc. Conclusion • Distribution of similarity for malware and benign are difference (Experiment-1) • Accuracy declines if trends of training and testing data are different (Experiment-2) • TPR of unique malware declines when we remove low similarity data (Experiment-3) • Continual investigation for trends of malware and related tools are required • (Might be necessary to develop technology to determine benign files) 24

×