SlideShare a Scribd company logo
Detecting and Classifying Self-Deleting
Windows Malware Using Prefetch Files
Presentation
Research
Objective
• Malware Detection
To present a technique for detecting malware execution
on a Windows Operating systems
• Malware Classification
To further classify the malware into its associated family.
Motivation
• Majority of the existing malware classification
techniques depends on the availability of the malware
sample. This is often not the case. For example,
Advanced Persistent Threat (APT) groups are known
to conduct targeted campaigns and then self-delete
the deployed malware. The authors in this paper
proposed an approach to detect evidence of malware
execution and perform malware classification without
assuming access to the original malware sample.
• MITRE ATT&CK framework defines such behavior as
Indicator Removal on Host::File Deletion (T1070.004)2
. Authors queried the ATT&CK repository and found
that 29.8% of threat actors and 33.7% of malware
campaigns have utilized self-deletion tactics.
Approach
The authors utilized the Windows Prefetch files to
detect and classify malware.
• A K-Nearest-Neighbor (kNN) classifier was trained to
detect if a prefetch file is evidence of malware
execution or not.
• The authors applied a custom classifier based on
Jaccard similarity to classify the suspected malware
into its closest family.
• The dataset used in the classification process
contained 48 Malware families and 4321 benign files
captured from live production systems.
Approach
Approach – Key
Terms
• Prefetch Files
• Jaccard Similarity
Key Terms –
Prefetch Files
These are the temporary files stored in the System
folder at (ROOT/Windows/Prefetch). A Prefetch file is a
file created when we open an application on our
windows system. Windows OS makes a prefetch record
when an application is run from a specific location for
the first time.
Prefetch Directory View
Prefetch Files View in Windows OS
Prefetch Files
Data
The key content found in a single prefetch file includes:
• Application (executable) name
• Hash of the executable path.
• The path of the executable file
• Timestamp (creation, modification, and accessed time
) of executable
• Run count (Number of times the application is
executed)
• Last run time
• The timestamp for the last 8 run time
• File and Directories referenced by the executable
Prefetch Files Tool – WinPrefetchView
Prefetch Files Tool – WinPrefetchView
Key Terms –
Jaccard Similarity
The Jaccard similarity index (sometimes called the
Jaccard similarity coefficient) compares data for two sets
to see which data is shared and which is unique. It’s a
measure of similarity for the two sets of data, with a
range from 0% to 100%. The higher the percentage, the
more similar is the data.
Key Terms –
Jaccard Similarity
Calculation
Jaccard Index %=
(similar number of data in both sets)
(total number of data in both sets)
*100
J(A,B) = (|A∩B| / |A∪B|)
Key Terms –
Jaccard Similarity
Example
Suppose
A = {0,1,2,3,4}
B = {0,2,4,6,8,10,12}
Solution: J(A,B) = |A∩B| / |A∪B| = |{0,2,5}| /
|{0,1,2,3,4,5,6,7,9}| = 3/9 = 0.33.
Similarity= 33%
Proposed
Methodology
The proposed scheme is divided into three phases.
1. Feature Extraction Selection, and Engineering from
Prefetch Files
2. Malware Detection
3. Malware Family Classification
Phase # 1
Feature Extraction, Selection,
and Engineering
1. Extract instances (features) from Prefetch files.
2. Apply Normalization Process to remove extra
features based on the following criteria.
• Concept Drifting
• Overfitting
• Variance
• Correlation
Drop Features Criteria
Concept drift: Concept drift is when features in datasets
decay over time due to underlying changes in the
malware.
Overfitting: Overfitting can occur when a dataset
contains features that leak the target class at training
time.
Low Variance: A model with low variance means
sampled data is close to where the model predicted it
would be.
High Correlation: Variables with high correlation (
strong relationship between them) are dropped in the
scheme.
Features Normalization
Results
Normalization reduced the data set from 4321
features (references) to 1381 features.
The final pre-process feature set is
represented as an unordered set, or a bag-of-
words (BoW), of the references made by a
process and is used to train the malware
detector and family classifier in the scheme.
Phase # 2
Malware Detection
The malware detector identifies which process’ prefetch
file contains evidence of malicious behavior. They
perform steps:
• Train K-Nearest Neighbor (KNN) model using Jaccard
distance metric and K = 5
• Windows OS can store up to 1024 prefetch files,
associated with most recent 1024 system processes.
• Extract features from each of 1024 prefetch files
• Clustered processed features against labeled dataset
of malware and benign softwares
• The 5 closest software instances makes the file
membership as benign or malware.
Note: the prefetch file is not malware but it holds forensic evidences of
malware execution left by each process’s prefetch file
Phase # 3
Malware Classification
• Once Malicious Process’s Prefetch file Identified, then
its feature set is used to classify into a known malware
family.
• The learned features dataset used to compute
Similarity metric and extends to Jaccard Similarity
Index to get semantic preservation.
Ensure Semantic Preservation by finding:
• Minimum feature set for each family that is a union of
all prefetch feature sets of malware of the family.
• This minimum feature set follows the intuition that
each malware variant within a family share a common
subset of behaviors. By guaranteeing the minimum
feature set is met before computing similarity, enforce
the assumption that removing core features degrades
program semantics.
Dataset
Dataset contains 4,442 prefetch files representing
• 4,442 unique malware process executions collected
from 48 malware families executed in a sandbox
hypervisor. The malware families capture a variety of
malware behavior, including cyber espionage (Duqu
2.0), proxy-enabling click fraud (Nodersok), and
selfpropagating worms that install backdoors
(EternalRocks). The authros executed each malware in
a Windows 10 virtual machine and extracted their
respective prefetch feature sets.
• 4,296 presumed benign prefetch files collected from
twenty Windows computers in a university computer
lab. Benign applications collected include common
Microsoft Office products, text editors, web browsers,
and various integrated development environments
(IDEs).

More Related Content

Similar to detection and classification of malware.pptx

MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
IJNSA Journal
 
AI for Ransomware Detection & Prevention Insights from Patents
AI for Ransomware Detection & Prevention Insights from PatentsAI for Ransomware Detection & Prevention Insights from Patents
AI for Ransomware Detection & Prevention Insights from Patents
Alex G. Lee, Ph.D. Esq. CLP
 
What Are The Types of Malware? Must Read
What Are The Types of Malware? Must ReadWhat Are The Types of Malware? Must Read
What Are The Types of Malware? Must Read
Bytecode Security
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learning
jaigera
 
First Responders Course - Session 7 - Incident Scope Assessment [2004]
First Responders Course - Session 7 - Incident Scope Assessment [2004]First Responders Course - Session 7 - Incident Scope Assessment [2004]
First Responders Course - Session 7 - Incident Scope Assessment [2004]
Phil Huggins FBCS CITP
 
SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012
Rian Yulian
 
Ethical Hacking n VAPT presentation by Suvrat jain
Ethical Hacking n VAPT presentation by Suvrat jainEthical Hacking n VAPT presentation by Suvrat jain
Ethical Hacking n VAPT presentation by Suvrat jain
Suvrat Jain
 
Current Topics paper A4 submission 4.30.2015 Master Copy
Current Topics paper A4 submission 4.30.2015 Master CopyCurrent Topics paper A4 submission 4.30.2015 Master Copy
Current Topics paper A4 submission 4.30.2015 Master CopyTommie Walls
 
Malware classification using Machine Learning
Malware classification using Machine LearningMalware classification using Machine Learning
Malware classification using Machine Learning
Japneet Singh
 
5 howtomitigate
5 howtomitigate5 howtomitigate
5 howtomitigatericharddxd
 
Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...
Varun Mithran
 
Project in malware analysis:C2C
Project in malware analysis:C2CProject in malware analysis:C2C
Project in malware analysis:C2C
Fabrizio Farinacci
 
Digital Forensics and Incident Response (DFIR) Training Session - January
Digital Forensics and Incident Response (DFIR) Training Session - JanuaryDigital Forensics and Incident Response (DFIR) Training Session - January
Digital Forensics and Incident Response (DFIR) Training Session - January
Infocyte
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
Sergey Sotnikov
 
Ch09 Performing Vulnerability Assessments
Ch09 Performing Vulnerability AssessmentsCh09 Performing Vulnerability Assessments
Ch09 Performing Vulnerability Assessments
Information Technology
 
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
Black Duck by Synopsys
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detection
sujeeshkumarj
 
CHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdfCHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdf
ManjuAppukuttan2
 
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat Security Conference
 

Similar to detection and classification of malware.pptx (20)

proposal
proposalproposal
proposal
 
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
 
AI for Ransomware Detection & Prevention Insights from Patents
AI for Ransomware Detection & Prevention Insights from PatentsAI for Ransomware Detection & Prevention Insights from Patents
AI for Ransomware Detection & Prevention Insights from Patents
 
What Are The Types of Malware? Must Read
What Are The Types of Malware? Must ReadWhat Are The Types of Malware? Must Read
What Are The Types of Malware? Must Read
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learning
 
First Responders Course - Session 7 - Incident Scope Assessment [2004]
First Responders Course - Session 7 - Incident Scope Assessment [2004]First Responders Course - Session 7 - Incident Scope Assessment [2004]
First Responders Course - Session 7 - Incident Scope Assessment [2004]
 
SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012
 
Ethical Hacking n VAPT presentation by Suvrat jain
Ethical Hacking n VAPT presentation by Suvrat jainEthical Hacking n VAPT presentation by Suvrat jain
Ethical Hacking n VAPT presentation by Suvrat jain
 
Current Topics paper A4 submission 4.30.2015 Master Copy
Current Topics paper A4 submission 4.30.2015 Master CopyCurrent Topics paper A4 submission 4.30.2015 Master Copy
Current Topics paper A4 submission 4.30.2015 Master Copy
 
Malware classification using Machine Learning
Malware classification using Machine LearningMalware classification using Machine Learning
Malware classification using Machine Learning
 
5 howtomitigate
5 howtomitigate5 howtomitigate
5 howtomitigate
 
Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...
 
Project in malware analysis:C2C
Project in malware analysis:C2CProject in malware analysis:C2C
Project in malware analysis:C2C
 
Digital Forensics and Incident Response (DFIR) Training Session - January
Digital Forensics and Incident Response (DFIR) Training Session - JanuaryDigital Forensics and Incident Response (DFIR) Training Session - January
Digital Forensics and Incident Response (DFIR) Training Session - January
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
 
Ch09 Performing Vulnerability Assessments
Ch09 Performing Vulnerability AssessmentsCh09 Performing Vulnerability Assessments
Ch09 Performing Vulnerability Assessments
 
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
FLIGHT WEST 2018 - Presentation - SCA 101: How to Manage Open Source Security...
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detection
 
CHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdfCHAPTER 2 BASIC ANALYSIS.pdf
CHAPTER 2 BASIC ANALYSIS.pdf
 
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
BlueHat v18 || Return of the kernel rootkit malware (on windows 10)
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

detection and classification of malware.pptx

  • 1. Detecting and Classifying Self-Deleting Windows Malware Using Prefetch Files Presentation
  • 2. Research Objective • Malware Detection To present a technique for detecting malware execution on a Windows Operating systems • Malware Classification To further classify the malware into its associated family.
  • 3. Motivation • Majority of the existing malware classification techniques depends on the availability of the malware sample. This is often not the case. For example, Advanced Persistent Threat (APT) groups are known to conduct targeted campaigns and then self-delete the deployed malware. The authors in this paper proposed an approach to detect evidence of malware execution and perform malware classification without assuming access to the original malware sample. • MITRE ATT&CK framework defines such behavior as Indicator Removal on Host::File Deletion (T1070.004)2 . Authors queried the ATT&CK repository and found that 29.8% of threat actors and 33.7% of malware campaigns have utilized self-deletion tactics.
  • 4. Approach The authors utilized the Windows Prefetch files to detect and classify malware. • A K-Nearest-Neighbor (kNN) classifier was trained to detect if a prefetch file is evidence of malware execution or not. • The authors applied a custom classifier based on Jaccard similarity to classify the suspected malware into its closest family. • The dataset used in the classification process contained 48 Malware families and 4321 benign files captured from live production systems.
  • 6. Approach – Key Terms • Prefetch Files • Jaccard Similarity
  • 7. Key Terms – Prefetch Files These are the temporary files stored in the System folder at (ROOT/Windows/Prefetch). A Prefetch file is a file created when we open an application on our windows system. Windows OS makes a prefetch record when an application is run from a specific location for the first time.
  • 9. Prefetch Files View in Windows OS
  • 10. Prefetch Files Data The key content found in a single prefetch file includes: • Application (executable) name • Hash of the executable path. • The path of the executable file • Timestamp (creation, modification, and accessed time ) of executable • Run count (Number of times the application is executed) • Last run time • The timestamp for the last 8 run time • File and Directories referenced by the executable
  • 11. Prefetch Files Tool – WinPrefetchView
  • 12. Prefetch Files Tool – WinPrefetchView
  • 13. Key Terms – Jaccard Similarity The Jaccard similarity index (sometimes called the Jaccard similarity coefficient) compares data for two sets to see which data is shared and which is unique. It’s a measure of similarity for the two sets of data, with a range from 0% to 100%. The higher the percentage, the more similar is the data.
  • 14. Key Terms – Jaccard Similarity Calculation Jaccard Index %= (similar number of data in both sets) (total number of data in both sets) *100 J(A,B) = (|A∩B| / |A∪B|)
  • 15. Key Terms – Jaccard Similarity Example Suppose A = {0,1,2,3,4} B = {0,2,4,6,8,10,12} Solution: J(A,B) = |A∩B| / |A∪B| = |{0,2,5}| / |{0,1,2,3,4,5,6,7,9}| = 3/9 = 0.33. Similarity= 33%
  • 16. Proposed Methodology The proposed scheme is divided into three phases. 1. Feature Extraction Selection, and Engineering from Prefetch Files 2. Malware Detection 3. Malware Family Classification
  • 17. Phase # 1 Feature Extraction, Selection, and Engineering 1. Extract instances (features) from Prefetch files. 2. Apply Normalization Process to remove extra features based on the following criteria. • Concept Drifting • Overfitting • Variance • Correlation
  • 18. Drop Features Criteria Concept drift: Concept drift is when features in datasets decay over time due to underlying changes in the malware. Overfitting: Overfitting can occur when a dataset contains features that leak the target class at training time. Low Variance: A model with low variance means sampled data is close to where the model predicted it would be. High Correlation: Variables with high correlation ( strong relationship between them) are dropped in the scheme.
  • 19. Features Normalization Results Normalization reduced the data set from 4321 features (references) to 1381 features. The final pre-process feature set is represented as an unordered set, or a bag-of- words (BoW), of the references made by a process and is used to train the malware detector and family classifier in the scheme.
  • 20. Phase # 2 Malware Detection The malware detector identifies which process’ prefetch file contains evidence of malicious behavior. They perform steps: • Train K-Nearest Neighbor (KNN) model using Jaccard distance metric and K = 5 • Windows OS can store up to 1024 prefetch files, associated with most recent 1024 system processes. • Extract features from each of 1024 prefetch files • Clustered processed features against labeled dataset of malware and benign softwares • The 5 closest software instances makes the file membership as benign or malware. Note: the prefetch file is not malware but it holds forensic evidences of malware execution left by each process’s prefetch file
  • 21. Phase # 3 Malware Classification • Once Malicious Process’s Prefetch file Identified, then its feature set is used to classify into a known malware family. • The learned features dataset used to compute Similarity metric and extends to Jaccard Similarity Index to get semantic preservation. Ensure Semantic Preservation by finding: • Minimum feature set for each family that is a union of all prefetch feature sets of malware of the family. • This minimum feature set follows the intuition that each malware variant within a family share a common subset of behaviors. By guaranteeing the minimum feature set is met before computing similarity, enforce the assumption that removing core features degrades program semantics.
  • 22. Dataset Dataset contains 4,442 prefetch files representing • 4,442 unique malware process executions collected from 48 malware families executed in a sandbox hypervisor. The malware families capture a variety of malware behavior, including cyber espionage (Duqu 2.0), proxy-enabling click fraud (Nodersok), and selfpropagating worms that install backdoors (EternalRocks). The authros executed each malware in a Windows 10 virtual machine and extracted their respective prefetch feature sets. • 4,296 presumed benign prefetch files collected from twenty Windows computers in a university computer lab. Benign applications collected include common Microsoft Office products, text editors, web browsers, and various integrated development environments (IDEs).