SlideShare a Scribd company logo
Malware Detection using Machine Learning
By:
Shubham Dubey(14ucs114)
Malware overview
●Malicious software that tries to damage or perform unauthorized
access to your system.
●Can be of different type:
Virus | Trojan | Adware | Worm etc
●More then 1 Lacs new samples found by AV companies every day.
●Most of them are Variant of each other or some old samples.
Current status of Detection
●Currently Antivirus company use signature based detection.
●Signature can be anything from strings to assembly code
snippets.
Problem with current method
●Polymorphic malware can change their code on every execution.
●Most malware can encrypt or Pack themselves using packers.
●Detecting those malware using signature doesn’t work all the time.
Solution using ML
●API sequence features can be used to detect if a file is malicious
or not.
●API calls are robust way of analysis as they cannot be alter easily.
●They outline everything happening to the operating system,
including the operations on the files,registry, mutexes, processes
and other features mentioned earlier.
●For example, OpenFile, CreateFile define the file operations,
OpenMutex, CreateMutex and describe mutexes opened/created.
Our System Description
●Cuckoo sandbox is used to analyze and record all Api calls.
●File report get saved into json format file.
●Calls get parsed to save inside csv file in matrix vector format.
●Samples with less then 10 calls(or any other user set value) get
ignored.
Feature extraction
● The frequency representation approach has been taken.
●S1,S2..are sample number. API1,API2 are Calls made to an API.
Redundant subsequence removal
methods
●There are large number of useless api call sequence present.
●They can be removed using N-gram sample subsequence
extraction.
●Match if some api calling pattern is present in many sample then
remove it.(Works like sliding window)
Redundant subsequence removal
methods
●Other method can be using information gain
●C entropy of the malware detection system
● H(C) is the information entropy
●The information gain of the subsequence T to class C is:
●p(ti) is the probability that the feature appear and p(tj) is opposite.
Using machine learning methods
●After the features were extracted and selected, we can apply the
machine learning methods to the data that we obtained.
●The packages used for the implementation of algorithms are:
Random Forest – randomForest
K-Nearest Neighbours – class
Support Vector Machines – kernlab
J48 Decision Tree – RWeka
Using machine learning methods
●After the features were extracted and selected, we can apply the
machine learning methods to the data that we obtained.
●The packages used for the implementation of algorithms are:
Random Forest – randomForest
K-Nearest Neighbours – class
Support Vector Machines – kernlab
J48 Decision Tree – RWeka
Comparison method
●The Cuckoo analysis score is an indication of how malicious an
analyzed file is.
●In total, there are three levels of severity and all levels have their
score of severity: 1 for low, 2 for medium and 3 for high.
●It is hard to measure the accuracy of the detection since there is
no threshold value indicating whether the sample is malicious or
not.
●This can be compare with the result received by ML algorithms.
Results
●The accuracy of detection is measured as the
percentage of correctly identified instances:
Support Vector Machines Results
●The overall accuracy achieved was 87.6% for multi-
class classification and 94.6% for binary classification.
Random Forest Results
●The algorithm resulted in a good accuracy of
predictions, 95.69% for multi-class classification and
96.8% for binary classification.
KNN Results
●As it can be seen, the best accuracy was achieved with
k=1. The algorithm resulted in a good accuracy of 87%
for multi-class classification and 94.6% for two-class
classification.
Conclusion
Experiments show that the integrated Machine learning
classifier has a better performance than the separate
signature based Detection.
Conclusion
In classification problems, different models gave different
results. The lowest accuracy was achieved by Naive Bayes
(72.34% and 55%), followed by k-Nearest-Neighbors and
Support Vector Machines (87%, 94.6% and 87.6%,
94.6% respectively). The highest accuracy was achieved
with the J48 and Random Forest models, and it was equal
to 93.3% and 95.69% for multi-class classification and
94.6% and 96.8% for binary classification respectively.
Thank You

More Related Content

What's hot

Introduction to Network Security
Introduction to Network SecurityIntroduction to Network Security
Introduction to Network Security
John Ely Masculino
 

What's hot (20)

Malware classification using Machine Learning
Malware classification using Machine LearningMalware classification using Machine Learning
Malware classification using Machine Learning
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware Detection
 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
Malware- Types, Detection and Future
Malware- Types, Detection and FutureMalware- Types, Detection and Future
Malware- Types, Detection and Future
 
Introduction to Network Security
Introduction to Network SecurityIntroduction to Network Security
Introduction to Network Security
 
Mobile Application Security
Mobile Application SecurityMobile Application Security
Mobile Application Security
 
Intrusion Detection System(IDS)
Intrusion Detection System(IDS)Intrusion Detection System(IDS)
Intrusion Detection System(IDS)
 
Network security - OSI Security Architecture
Network security - OSI Security ArchitectureNetwork security - OSI Security Architecture
Network security - OSI Security Architecture
 
Malware and security
Malware and securityMalware and security
Malware and security
 
malware analysis
malware  analysismalware  analysis
malware analysis
 
Mobile Security
Mobile SecurityMobile Security
Mobile Security
 
Cryptography.ppt
Cryptography.pptCryptography.ppt
Cryptography.ppt
 
Intrusion detection system
Intrusion detection system Intrusion detection system
Intrusion detection system
 
Ransomware
RansomwareRansomware
Ransomware
 
Operating System Security
Operating System SecurityOperating System Security
Operating System Security
 
WannaCry Ransomware
 WannaCry Ransomware WannaCry Ransomware
WannaCry Ransomware
 
Introduction to Malware Detection and Reverse Engineering
Introduction to Malware Detection and Reverse EngineeringIntroduction to Malware Detection and Reverse Engineering
Introduction to Malware Detection and Reverse Engineering
 
Virus
VirusVirus
Virus
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
 

Similar to Malware Dectection Using Machine learning

Automatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detectionAutomatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detection
UltraUploader
 
CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems
butest
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
NaveenAd4
 
Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...
UltraUploader
 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detector
UltraUploader
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
Craig Cannon
 

Similar to Malware Dectection Using Machine learning (20)

Automatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detectionAutomatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detection
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
 
CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems
 
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
 
Icacci presentation-isi-ransomware
Icacci presentation-isi-ransomwareIcacci presentation-isi-ransomware
Icacci presentation-isi-ransomware
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
 
Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...
 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detector
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
 
Injection Attack detection using ML for
Injection Attack detection using ML  forInjection Attack detection using ML  for
Injection Attack detection using ML for
 
IDS for IoT.pptx
IDS for IoT.pptxIDS for IoT.pptx
IDS for IoT.pptx
 
Antimalware
AntimalwareAntimalware
Antimalware
 
System Event Monitoring for Active Authentication
System Event Monitoring for Active AuthenticationSystem Event Monitoring for Active Authentication
System Event Monitoring for Active Authentication
 
IRJET - Survey on Malware Detection using Deep Learning Methods
IRJET -  	  Survey on Malware Detection using Deep Learning MethodsIRJET -  	  Survey on Malware Detection using Deep Learning Methods
IRJET - Survey on Malware Detection using Deep Learning Methods
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
 
Today
TodayToday
Today
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 

Malware Dectection Using Machine learning

  • 1. Malware Detection using Machine Learning By: Shubham Dubey(14ucs114)
  • 2. Malware overview ●Malicious software that tries to damage or perform unauthorized access to your system. ●Can be of different type: Virus | Trojan | Adware | Worm etc ●More then 1 Lacs new samples found by AV companies every day. ●Most of them are Variant of each other or some old samples.
  • 3. Current status of Detection ●Currently Antivirus company use signature based detection. ●Signature can be anything from strings to assembly code snippets.
  • 4. Problem with current method ●Polymorphic malware can change their code on every execution. ●Most malware can encrypt or Pack themselves using packers. ●Detecting those malware using signature doesn’t work all the time.
  • 5. Solution using ML ●API sequence features can be used to detect if a file is malicious or not. ●API calls are robust way of analysis as they cannot be alter easily. ●They outline everything happening to the operating system, including the operations on the files,registry, mutexes, processes and other features mentioned earlier. ●For example, OpenFile, CreateFile define the file operations, OpenMutex, CreateMutex and describe mutexes opened/created.
  • 6. Our System Description ●Cuckoo sandbox is used to analyze and record all Api calls. ●File report get saved into json format file. ●Calls get parsed to save inside csv file in matrix vector format. ●Samples with less then 10 calls(or any other user set value) get ignored.
  • 7. Feature extraction ● The frequency representation approach has been taken. ●S1,S2..are sample number. API1,API2 are Calls made to an API.
  • 8. Redundant subsequence removal methods ●There are large number of useless api call sequence present. ●They can be removed using N-gram sample subsequence extraction. ●Match if some api calling pattern is present in many sample then remove it.(Works like sliding window)
  • 9. Redundant subsequence removal methods ●Other method can be using information gain ●C entropy of the malware detection system ● H(C) is the information entropy ●The information gain of the subsequence T to class C is: ●p(ti) is the probability that the feature appear and p(tj) is opposite.
  • 10. Using machine learning methods ●After the features were extracted and selected, we can apply the machine learning methods to the data that we obtained. ●The packages used for the implementation of algorithms are: Random Forest – randomForest K-Nearest Neighbours – class Support Vector Machines – kernlab J48 Decision Tree – RWeka
  • 11. Using machine learning methods ●After the features were extracted and selected, we can apply the machine learning methods to the data that we obtained. ●The packages used for the implementation of algorithms are: Random Forest – randomForest K-Nearest Neighbours – class Support Vector Machines – kernlab J48 Decision Tree – RWeka
  • 12. Comparison method ●The Cuckoo analysis score is an indication of how malicious an analyzed file is. ●In total, there are three levels of severity and all levels have their score of severity: 1 for low, 2 for medium and 3 for high. ●It is hard to measure the accuracy of the detection since there is no threshold value indicating whether the sample is malicious or not. ●This can be compare with the result received by ML algorithms.
  • 13. Results ●The accuracy of detection is measured as the percentage of correctly identified instances:
  • 14. Support Vector Machines Results ●The overall accuracy achieved was 87.6% for multi- class classification and 94.6% for binary classification.
  • 15. Random Forest Results ●The algorithm resulted in a good accuracy of predictions, 95.69% for multi-class classification and 96.8% for binary classification.
  • 16. KNN Results ●As it can be seen, the best accuracy was achieved with k=1. The algorithm resulted in a good accuracy of 87% for multi-class classification and 94.6% for two-class classification.
  • 17. Conclusion Experiments show that the integrated Machine learning classifier has a better performance than the separate signature based Detection.
  • 18. Conclusion In classification problems, different models gave different results. The lowest accuracy was achieved by Naive Bayes (72.34% and 55%), followed by k-Nearest-Neighbors and Support Vector Machines (87%, 94.6% and 87.6%, 94.6% respectively). The highest accuracy was achieved with the J48 and Random Forest models, and it was equal to 93.3% and 95.69% for multi-class classification and 94.6% and 96.8% for binary classification respectively.