SlideShare a Scribd company logo
1 of 25
Effective Malware Detection based on
Behavior and Data Features
Zhiwu Xu, Cheng Wen, Shengchao Qin, and Zhong Ming
College of Computer Science and Software Engineering,
Shenzhen University, China
Introduction
Approach
Experiments
Conclusion
Malware
 Malicious software:
 Computer viruses, worms, Trojan
horses, ransomware, spyware, adware,
scareware, and other intrusive codes
 Recent report from McAfee:
 More than 650 million malware samples
detected in Q1, 2017, in which more than
30 million ones are new.
Signature-based method
 To compare with the known signatures,
 Comodo, McAfee, Kaspersky, Kingsoft, and
Symantec
 Can be easily evaded by the evasion techniques
 packing, variable-renaming, and
polymorphism.
Heuristic-based method
 To identity malicious patterns though either
static analysis or dynamic analysis
 However, heavy-weight or Inefficient
Machine learning approaches
Most of existing work focus on behaviour
features, without data information
 binary codes, opcodes and API calls
 Can be easily evaded
 previously-unseen behaviors
 obfuscate
Introduction
Approach
Experiments
Conclusion
Our approaches
 Based on machine learning
 Consider both the behaviour information and
the data information.
 Consider the time-split samples and obfuscated
samples
Framework
Feature
Extractor
Feature Extractor
 Decompilation
 Information Extraction
 Feature Selection and representation
Feature Extractor
 Decompilation
 Information Extraction
 Feature Selection and representation
Decompilation
Tool
ASM codes
Feature Extractor
 Decompilation
 Information Extraction
 Feature Selection and representation
Opcode
System call
Data Type:int *
Feature Extractor
 Decompilation
 Information Extraction
 Feature Selection and representation
Selection:
Term Frequency and Inverse Document Frequency (TF-IDF)
Representation:
Framework
Classifier
Classifier
Classifier Training
 An executable 𝑒 can be represented as a vector 𝑥. 𝐷0
represent the available dataset with known categories. Our
training problem is to find a classifier 𝐶: 𝑋 → [0,1] such that
𝑚𝑖𝑛
𝑥,𝑐 ∈𝐷0
𝑑 𝐶 𝑥 − 𝑐
Malware Detection
 Given an executable 𝑒 and its vector representation, the
goal of the detection is to find 𝑐 such that
min 𝑑 𝐶 𝑥 − 𝑐
Introduction
Approach
Experiments
Conclusion
Experiments
Malware dataset (11376 samples)
BIG 2015 Challenge
theZoo aka Malware DB
Benign dataset (8003 samples)
QIHU 360 software
(with the total size of 250 GB)
Cross Validation Experiments
10-fold cross validation
250GB, 15.6 hours, Decompile 0.22s/MB
182GB, 10.5 hours, Extract features 0.20s/MB
Runtime performance
Classifier Training Time (s) Testing Time (s)
KNN (k = 1) 0 + (16.477) 178.789
KNN (k = 3) 0 + (16.369) 199.474
KNN (k = 5) 0 + (16.517) 207.052
KNN (k = 7) 0 + (16.238) 210.557
DT (criterion = ‘gini’) 23.442 0.067
DT (criterion = ‘entropy’) 13.485 0.066
RF (n = 10, gini) 4.115 0.086
RF (n = 10, entropy) 3.791 0.077
Gaussian Naïve Bayes 3.093 0.480
Multinomial Naïve Bayes 1.535 0.035
Bernouli Naïve Bayes 1.826 0.828
SVM (kernel = ‘linear’) 150.022 14.494
SVM (kernel = ‘rbf’) 799.310 50.196
SVM (kernel = ‘sigmoid’) 1303.607 130.178
SGD Classifier 22.569 0.048
Feature Experiment
Time-Split Experiment
We use some fresh malware samples, which
were collected dated from January 2017 to July
2017, from the DAS MALWERK website.
Obfuscation Experiments
Obfuscation tools:Obfuscator
 Change code execution flow
Obfuscation tools:Unest
 rewriting digital changes equivalently
 confusing the output string
 pushing the target code segment into the stack and jumping
to it to confuse the target code
 obfuscating the static libraries
Introduction
Approach
Experiments
Conclusion
Conclusion
Machine learning methods based on the opcodes,
data types and system libraries.
Carried out some interesting experiments.
Capable of detecting some fresh malware
Has a resistance to some obfuscation techniques
That ’s all.
Thank you very much!

More Related Content

Similar to Malware_SmartCom_2017

Introduzione ai network penetration test secondo osstmm
Introduzione ai network penetration test secondo osstmmIntroduzione ai network penetration test secondo osstmm
Introduzione ai network penetration test secondo osstmmSimone Onofri
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...Silvio Cesare
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowSilvio Cesare
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH
 
Security Testing ModernApps_v1.0
Security Testing ModernApps_v1.0Security Testing ModernApps_v1.0
Security Testing ModernApps_v1.0Neelu Tripathy
 
Malware Most Wanted: Security Ecosystem
Malware Most Wanted: Security EcosystemMalware Most Wanted: Security Ecosystem
Malware Most Wanted: Security EcosystemCyphort
 
SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022Becky Burwell
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Gobinath Loganathan
 
IRJET - Survey on Malware Detection using Deep Learning Methods
IRJET -  	  Survey on Malware Detection using Deep Learning MethodsIRJET -  	  Survey on Malware Detection using Deep Learning Methods
IRJET - Survey on Malware Detection using Deep Learning MethodsIRJET Journal
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security LLC
 
Machine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsMachine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsVenkat Projects
 
Machine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsMachine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsVenkat Projects
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Parasoft .TEST, Write better C# Code Using Data Flow Analysis
Parasoft .TEST, Write better C# Code Using  Data Flow Analysis Parasoft .TEST, Write better C# Code Using  Data Flow Analysis
Parasoft .TEST, Write better C# Code Using Data Flow Analysis Engineering Software Lab
 
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptxDeepanjanKundu2
 
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Silvio Cesare
 

Similar to Malware_SmartCom_2017 (20)

Cybersecurity - Jim Butterworth
Cybersecurity - Jim ButterworthCybersecurity - Jim Butterworth
Cybersecurity - Jim Butterworth
 
Antimalware
AntimalwareAntimalware
Antimalware
 
Introduzione ai network penetration test secondo osstmm
Introduzione ai network penetration test secondo osstmmIntroduzione ai network penetration test secondo osstmm
Introduzione ai network penetration test secondo osstmm
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Security Testing ModernApps_v1.0
Security Testing ModernApps_v1.0Security Testing ModernApps_v1.0
Security Testing ModernApps_v1.0
 
Malware Most Wanted: Security Ecosystem
Malware Most Wanted: Security EcosystemMalware Most Wanted: Security Ecosystem
Malware Most Wanted: Security Ecosystem
 
SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...
 
IRJET - Survey on Malware Detection using Deep Learning Methods
IRJET -  	  Survey on Malware Detection using Deep Learning MethodsIRJET -  	  Survey on Malware Detection using Deep Learning Methods
IRJET - Survey on Malware Detection using Deep Learning Methods
 
ICoSTEC-PPT.pptx
ICoSTEC-PPT.pptxICoSTEC-PPT.pptx
ICoSTEC-PPT.pptx
 
Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠Integris Security - Hacking With Glue ℠
Integris Security - Hacking With Glue ℠
 
Machine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsMachine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applications
 
Machine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applicationsMachine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applications
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Parasoft .TEST, Write better C# Code Using Data Flow Analysis
Parasoft .TEST, Write better C# Code Using  Data Flow Analysis Parasoft .TEST, Write better C# Code Using  Data Flow Analysis
Parasoft .TEST, Write better C# Code Using Data Flow Analysis
 
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
 
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
 

Recently uploaded

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Malware_SmartCom_2017

Editor's Notes

  1. Thank you for the Session Chair. I am very honor to have this opportunity to attend this conference. The topic of my paper is “Effective Malware Detection based on Behavior and Data Features”. I am the speaker Cheng Wen. This work is done with Zhiwu Xu, Shengchao Qin and Zhong Ming.
  2. The outline of my talk as follows. In the first part I want to introduce the background of this research. The second part present our approach. Followed by experiments. Finally, a simple conclusion is given. Well, let’s move on the first part of this topic.
  3. Malware is a generic term that encompasses viruses, Worm, spywares and other intrusive codes. They are spreading all over the world and are increasing day by day, thus becoming a serious threat. According to the recent report from McAfee, there are more than 650 million malware samples detected in the first quarter, 2017, in which more than 30 million ones are new. So the detection of malware is of major concern to both the anti-malware industry and researchers.
  4. To protect users from these threats, anti-malware software products from different companies provide the major defense against malware, such as Comodo, McAfee and so on, wherein the signature-based method is employed. However, this method can be easily evaded by malware writers through the evasion techniques.
  5. To overcome the limitation of the signature-based method, heuristic-based method are proposed, which focuses on identifying the malicious behavior patterns, though either static analysis or dynamic analysis. But the increasing number of malware samples makes this method no longer considered effective.
  6. Recently, various machine learning approaches have been proposed for detecting malware. Although some approaches can get a high accuracy (for the stationary data sets), it is still not enough for malware detection. Most of existing work focus on the behaviour features such as binary codes, opcodes and API calls, leaving the data information out of consideration. Also, It can be easily evaded. Malware evolves rapidly and it thus becomes hard to generalize learning models to reflect future, previously-unseen behaviors. And most of the work didn’t consider the resistance to obfuscation techniques.
  7. Next, Let’s move to the second part.
  8. In this paper, we propose an effective approach to detect malware based on machine learning. Different from most existing work, we take into account not only the behaviour information but also the data information. We also consider the time-split samples and obfuscated samples, Generally, the behaviour information reflects which behaviours a software intends to do, while the data information indicates which data's a software intends to perform on or how data's are organized.
  9. This Figure shows the framework of our approach, which consists of two components, namely the feature extractor and the malware classifier. The feature extractor extracts the feature information from the executables and represents them as vectors. While the malware classifier is first trained from an available dataset of executables, and then can be used to detect new, unseen executables. In the following, we describe both components in more detail.
  10. Feature extractor is consists of the 3 steps, Decompilation, Information Extraction and Feature Selection and representation
  11. An instruction or a data in an executable file can be represented as a series of binary codes, which are clearly not easy to read. So the first step is to transform the binary codes into a readable intermediate representation such as assembly codes by a decompilation tool.
  12. Next, the extractor parses the asm files to extract the information, namely, opcodes, data types and system libraries. Generally, the opcodes used in an executable represent its intended behaviours, while the data types indicate the structures of the datas it may perform on. In addition, the imported system libraries, which reflect the interaction between the executable and system. All these information describes the possible mission of an executable in some sense, and similar executables share the similar information.
  13. We use the well-known scheme TF-IDF method to measure the statistical dependence. Next the extractor select the top k weight terms. Each executable can be represented as a vector. An example of vector is shown in the following.
  14. Another component is malware classifier.
  15. As mentioned before, we will first train our malware classifier from an available dataset of executables with known categories by a supervised machine-learning method, and then use it detect new, unseen executables.
  16. Followed by the experiments
  17. Our dataset consists of malwares and benigns. The malware dataset consists of the samples from BIG 2015 Challenge and from theZoo aka Malware DB, while the benign software are collected from 360 software company. We use various machine learning method to train a classifier and performed some experiments to test our approach’s ability.
  18. To evaluate the performance of our approach, we conducted 10-fold cross validation experiments. The learning methods we used in our experiments are listed in the table. Concerning ROC curve, most classifiers can produce much better classification results.
  19. Meanwhile, we counted the training times and the testing times in seconds for each cross validation experiment. The results are shown in this table. We also evaluated how the feature extractor perform. Both the decompilation time and the extracting time are acceptable.
  20. Next, we also performed experiments based on each kind of feature to see their effectiveness. For that, we conducted the same experiments as above for each kind of feature. From the results we can see that all the features are effective to detect malware, and using all of the features together produced the best results. The opcode and library features have been used by lots of work in practice, so we believe that the type information can benefit to malware detection as well in practice.
  21. In this section, to test our approach’s ability to detect genuinely new malware or new malware versions, we ran a time split experiment. First, we downloaded the malware samples, which were collected from January 2017 to July 2017. That is to say, all the malware samples are newer than the ones in our data set. About 81% of the samples can be detected by our classifier, which estimates that our approach can detect some new malware samples or new versions of existing malware samples. However, the results also indicate that the classifier becomes ineffective as time passes. This suggests that malware classifiers should be updated often with new data or new features in order to maintain the classification accuracy.
  22. One reason to make the malware detection difficult is that malware writers can use obfuscation techniques. In this section, we performed some experiments to test our approach’s ability to detect new malware samples that are obtained by obfuscating the existing ones. We use two commercial tool, Obfuscator and Unest, to obfuscate some malware samples, which are randomly selected from our data set. The results show that all the obfuscated malware samples can be detected by our classifier. That is to say, our classifier has a resistance to some obfuscation techniques.
  23. At last, I conclude the talk.
  24. In this work, we have proposed a malware detection approach using various machine learning methods based on the opcodes, data types and system libraries. To evaluate the proposed approach, we have carried out some interesting experiments. The experimental results have demonstrated that our classier is capable of detecting some fresh malware, and has a resistance to some obfuscation techniques.
  25. We use static analysis. In malware detection, both static analysis and dynamic analysis have their own advantages and limitations. In real application, we suggest using static analysis at first. If the file cannot be will-represented, then we can try dynamic analysis.