This document discusses challenges in malware detection and proposes using machine learning methods. It outlines issues like the enormous number of malware samples and need for automated detection. The objectives are to reduce irrelevant malware API features using feature selection. The methodology examines API calls, anti-debugging strings, and performs feature ranking selection, classification and clustering. The conclusion is that malware writers often reuse code, making it easier to trace malware families based on commonalities.
Malware Analysis on a Shoestring BudgetMichael Boman
How can you build a infrastructure using mainly free and open source software to analyze potential malicious code. How you can leverage free public services together with in-house systems to compete against expensive commercial solutions which makes it cost-prohibible for many researchers.
IDA Vulnerabilities and Bug Bounty by Masaaki ChidaCODE BLUE
IDA Pro is an advanced disassembler software and often used in vulnerability research and malware analysis. IDA Pro is used to analyse software behavior in detail, if there was a vulnerability and the user is attacked not only can it have impact in a social sense but also impact legal proceedings. In this presentation I will discuss the vulnerabilities found and attacks leveraging the vulnerabilities and Hex-rays's remediation process and dialogue I had with them.
http://codeblue.jp/en-speaker.html#MasaakiChida
Современные технологии и инструменты анализа вредоносного ПО_PHDays_2017_Pisk...Ivan Piskunov
Презентация к моему воркшопу на PHDays 2017 на тему "Современные технологии и инструменты анализа вредоносного ПО"
Ссылка на анонс https://www.phdays.ru/program/197805/
Ссылка с моего блога https://www.phdays.ru/program/197805/
Introducing Intelligence Into Your Malware AnalysisBrian Baskin
With malware becoming more prevalent, and the pool of capable reversers falling short of overall need, there is a greater need to provide quick and efficient malware analysis for network defense. While many analysts have a grasp on how to appropriately reverse malware, there is large room for improvement by extracting critical indicators, correlating on key details, and cataloging artifacts in a way to improve your corporate response for the next attack. This talk will cover beyond the basics of malware analysis and focus on critical indicators that should analysts should focus on for attribution and better reporting.
Malware Analysis on a Shoestring BudgetMichael Boman
How can you build a infrastructure using mainly free and open source software to analyze potential malicious code. How you can leverage free public services together with in-house systems to compete against expensive commercial solutions which makes it cost-prohibible for many researchers.
IDA Vulnerabilities and Bug Bounty by Masaaki ChidaCODE BLUE
IDA Pro is an advanced disassembler software and often used in vulnerability research and malware analysis. IDA Pro is used to analyse software behavior in detail, if there was a vulnerability and the user is attacked not only can it have impact in a social sense but also impact legal proceedings. In this presentation I will discuss the vulnerabilities found and attacks leveraging the vulnerabilities and Hex-rays's remediation process and dialogue I had with them.
http://codeblue.jp/en-speaker.html#MasaakiChida
Современные технологии и инструменты анализа вредоносного ПО_PHDays_2017_Pisk...Ivan Piskunov
Презентация к моему воркшопу на PHDays 2017 на тему "Современные технологии и инструменты анализа вредоносного ПО"
Ссылка на анонс https://www.phdays.ru/program/197805/
Ссылка с моего блога https://www.phdays.ru/program/197805/
Introducing Intelligence Into Your Malware AnalysisBrian Baskin
With malware becoming more prevalent, and the pool of capable reversers falling short of overall need, there is a greater need to provide quick and efficient malware analysis for network defense. While many analysts have a grasp on how to appropriately reverse malware, there is large room for improvement by extracting critical indicators, correlating on key details, and cataloging artifacts in a way to improve your corporate response for the next attack. This talk will cover beyond the basics of malware analysis and focus on critical indicators that should analysts should focus on for attribution and better reporting.
An examination of techniques used to detect, identify, isolate and defeat malware using popular virtual machines including VMWare, VirtualBox and others. For more information about malware detection and removal visit https://www.intertel.co.za
Revealing the Attack Operations Targeting Japan by Shusei Tomonaga & Yuu Nak...CODE BLUE
Japan is recently experiencing a rise in targeted attacks. However, it is rare that details of such attacks are revealed. Under this circumstance, JPCERT/CC has been investigating the attack operations targeting Japanese organizations including the government and leading enterprises. We have especially been tracking two distinct cases over a prolonged period.
The first case, which became public in 2015, drew nationwide attention for victimizing several Japanese organizations. In this case, the attacker conducts sophisticated attacks through network intrusion and targeting weak points of the organizations.
The second case has been continuously targeting certain Japanese organizations since 2013. Although this case has not drawn as much attention, the attacker has advanced techniques and uses various interesting attack methods.
This presentation will introduce the above two attack operations, including attack techniques we revealed through prolonged investigation, the malware/tools being used, as well as useful techniques/tools for analyzing related malware.
В последнее время все чаще происходят сложные целенаправленные атаки (APT) с использованием скрытой загрузки. Существующие системы автоанализа, как правило, не способны анализировать вредоносное ПО, используемое для APT-атак, и исследователи вредоносного ПО вынуждены анализировать его вручную. Докладчик представит новую систему автоанализа памяти в режиме реального времени (Malware Analyst). Данная система не генерирует дамп памяти при помощи LibVMI, а имеет непосредственный доступ в память для ускорения диагностики и четко распознает подозрительное поведение вредоносного ПО.
The Hunter Games: How to Find the Adversary with Event Query LanguageRoss Wolf
Circle City Con 2019 and BSides SATX 2019
Abstract:
How do you find malicious activity? We often resort to the cliche, you know it when you see it, but how do you even see it, without drowning in data? MITRE’s ATT&CK knowledge base organizes adversary behavior into tactics and techniques, and orients our approach to endpoint data. It suggests questions that might be worth asking, but not a way to ask them. The Event Query Language (EQL) allows a security analyst to naturally express queries for IOC search, hunting, and behavioral detections, while remaining platform and data source agnostic.
In this talk, I will demonstrate the iterative process of establishing situational awareness in your environment, creating targeted detections, and hunting for the adversary in your environment with real data, queries, and results.
If you create software that is planned for continuous enhancements over several years then you’ll need a sustainable strategy for code quality.
Code coverage by automated tests is one important metric, and its value should equal 100 per cent at all times.
This talk will show why this is so and how it can be achieved in real world software (with examples in Java)
By Andreas Czakaj, mensemedia Gesellschaft für Neue Medien mbH
This presentation was held at the W-JAX 2017 conference (https://jax.de/core-java-jvm-languages/100-code-coverage-in-real-world-software/)
2012 B-Sides and ToorCon Talk Offensive Defense
Blog Post - http://blog.ioactive.com/2013/01/offensive-defense.html
Cyber-criminals have had back-end infrastructures equivalent to Virus Total to test if malware and exploits are effective against AV scanners for many years, thus showing that attackers are proactively avoiding detection when building malware. In this day of age malicious binaries are generated on demand by server-side kits when a victim visits a malicious web page, making reliance solely on hash based solutions inadequate. In the last 15 years detection techniques have evolved in an attempt to keep up with attack trends. In the last few years security companies have looked for supplemental solutions such as the use of machine learning to detect and mitigate attacks against cyber criminals. Let's not pretend attackers can't bypass each and every detection technique currently deployed. Join me as I present and review current detection methods found in most host and network security solutions found today. We will re-review the defense in depth strategy while keeping in mind that a solid security strategy consists of forcing an attacker to spend as much time and effort while needing to know a variety of skills and technologies in order to successfully pull off the attack. In the end I hope to convince you that thinking defensively requires thinking offensively.
Zero Day Malware Detection/Prevention Using Open Source SoftwareMyNOG
Zero Day Malware Detection/Prevention Using Open Source Software – Proof of Concept
Fathi Kamil Mohad Zainuddin
Senior Analyst (Malware Research Centre, MyCERT)
An examination of techniques used to detect, identify, isolate and defeat malware using popular virtual machines including VMWare, VirtualBox and others. For more information about malware detection and removal visit https://www.intertel.co.za
Revealing the Attack Operations Targeting Japan by Shusei Tomonaga & Yuu Nak...CODE BLUE
Japan is recently experiencing a rise in targeted attacks. However, it is rare that details of such attacks are revealed. Under this circumstance, JPCERT/CC has been investigating the attack operations targeting Japanese organizations including the government and leading enterprises. We have especially been tracking two distinct cases over a prolonged period.
The first case, which became public in 2015, drew nationwide attention for victimizing several Japanese organizations. In this case, the attacker conducts sophisticated attacks through network intrusion and targeting weak points of the organizations.
The second case has been continuously targeting certain Japanese organizations since 2013. Although this case has not drawn as much attention, the attacker has advanced techniques and uses various interesting attack methods.
This presentation will introduce the above two attack operations, including attack techniques we revealed through prolonged investigation, the malware/tools being used, as well as useful techniques/tools for analyzing related malware.
В последнее время все чаще происходят сложные целенаправленные атаки (APT) с использованием скрытой загрузки. Существующие системы автоанализа, как правило, не способны анализировать вредоносное ПО, используемое для APT-атак, и исследователи вредоносного ПО вынуждены анализировать его вручную. Докладчик представит новую систему автоанализа памяти в режиме реального времени (Malware Analyst). Данная система не генерирует дамп памяти при помощи LibVMI, а имеет непосредственный доступ в память для ускорения диагностики и четко распознает подозрительное поведение вредоносного ПО.
The Hunter Games: How to Find the Adversary with Event Query LanguageRoss Wolf
Circle City Con 2019 and BSides SATX 2019
Abstract:
How do you find malicious activity? We often resort to the cliche, you know it when you see it, but how do you even see it, without drowning in data? MITRE’s ATT&CK knowledge base organizes adversary behavior into tactics and techniques, and orients our approach to endpoint data. It suggests questions that might be worth asking, but not a way to ask them. The Event Query Language (EQL) allows a security analyst to naturally express queries for IOC search, hunting, and behavioral detections, while remaining platform and data source agnostic.
In this talk, I will demonstrate the iterative process of establishing situational awareness in your environment, creating targeted detections, and hunting for the adversary in your environment with real data, queries, and results.
If you create software that is planned for continuous enhancements over several years then you’ll need a sustainable strategy for code quality.
Code coverage by automated tests is one important metric, and its value should equal 100 per cent at all times.
This talk will show why this is so and how it can be achieved in real world software (with examples in Java)
By Andreas Czakaj, mensemedia Gesellschaft für Neue Medien mbH
This presentation was held at the W-JAX 2017 conference (https://jax.de/core-java-jvm-languages/100-code-coverage-in-real-world-software/)
2012 B-Sides and ToorCon Talk Offensive Defense
Blog Post - http://blog.ioactive.com/2013/01/offensive-defense.html
Cyber-criminals have had back-end infrastructures equivalent to Virus Total to test if malware and exploits are effective against AV scanners for many years, thus showing that attackers are proactively avoiding detection when building malware. In this day of age malicious binaries are generated on demand by server-side kits when a victim visits a malicious web page, making reliance solely on hash based solutions inadequate. In the last 15 years detection techniques have evolved in an attempt to keep up with attack trends. In the last few years security companies have looked for supplemental solutions such as the use of machine learning to detect and mitigate attacks against cyber criminals. Let's not pretend attackers can't bypass each and every detection technique currently deployed. Join me as I present and review current detection methods found in most host and network security solutions found today. We will re-review the defense in depth strategy while keeping in mind that a solid security strategy consists of forcing an attacker to spend as much time and effort while needing to know a variety of skills and technologies in order to successfully pull off the attack. In the end I hope to convince you that thinking defensively requires thinking offensively.
Zero Day Malware Detection/Prevention Using Open Source SoftwareMyNOG
Zero Day Malware Detection/Prevention Using Open Source Software – Proof of Concept
Fathi Kamil Mohad Zainuddin
Senior Analyst (Malware Research Centre, MyCERT)
Malware Detection in Cloud Computing Infrastructures
malware detection whole design and working in a short ppt effectively explaining the criteria and infrastructure
Malware Detection Using Machine Learning TechniquesArshadRaja786
Malware viruses can be easily detected using machine learning Techniques such as K-Mean Algorithms, KNN algorithm, Boosted J48 Decision Tree and other Data Mining Techniques. Among them J48 proved to be more effective in detecting computer virus and upcoming networks worms...
Malware detection within enterprise networks is a critical component of an effective information security strategy. Instances of malware attacks are increasing – making them especially important to detect – and data science can help. This presentation outlines data science driven approaches to finding domains that have time and user-based co-occurrence relationships. It also includes a demonstration of a scalable and operationalizable framework to detect domain associations by analyzing the web traffic of users in any organization.
Additional information:
http://www.datasciencecentral.com/video/dsc-webinar-series-data-science-driven-approaches-to-malware
Approximating Attack Surfaces with Stack Traces [ICSE 15]Chris Theisen
Security testing and reviewing efforts are a necessity for software projects, but are time-consuming and expensive to apply. Identifying vulnerable code supports decision-making during all phases of software development. An approach for identifying vulnerable code is to identify its attack surface, the sum of all paths for untrusted data into and out of a system. Identifying the code that lies on the attack surface requires expertise and significant manual effort. This paper proposes an automated technique to empirically approximate attack surfaces through the analysis of stack traces. We hypothesize that stack traces from user-initiated crashes have several desirable attributes for measuring attack surfaces. The goal of this research is to aid software engineers in prioritizing security efforts by approximating the attack surface of a system via stack trace analysis. In a trial on Windows 8, the attack surface approximation selected 48.4% of the binaries and contained 94.6% of known vulnerabilities. Compared with vulnerability prediction models (VPMs) run on the entire codebase, VPMs run on the attack surface approximation improved recall from .07 to .1 for binaries and from .02 to .05 for source files. Precision remained at .5 for binaries, while improving from .5 to .69 for source files.
Ownux is an Information Security Consultation firm specializing in the field of Penetration Testing of every channel which classifies different security areas of interest within an organization. We are focused on Application Security, however, it is not limited to physical cyber security, reviewing the configurations of applications and security appliances. We have much more to offer.
Introduction to Web Application Penetration TestingRana Khalil
Intro to web application penetration testing workshop I held in Atlanta as part of the AnitaBorg Cybersecurity Weekend on Aug. 19. The link for the event can be found here: https://community.anitab.org/event/atl-cybersecurity-day-two/
An Efficient Framework for Detection & Classification of IoT BotNet.pptxSandeep Maurya
The Internet of Things (IoT) has become an integral requirement to equip common life. According to IDC, the number of IoT devices may increase exponentially up to a trillion in near future. Thus, their cyberspace having inherent vulnerabilities leads to various possible serious cyber-attacks. So, the security of IoT systems becomes the prime concern for its consumers and businesses. Therefore, to enhance the reliability of IoT security systems, a better and real-time approach is required. For this purpose, the creation of a real-time dataset is essential for IoT traffic analysis. In this paper, the experimental testbed has been devised for the generation of a real-time dataset using the IoT botnet traffic in which each of the bots consists of several possible attacks. Besides, an extensive comparative study of the proposed dataset and existing datasets are done using popular Machine Learning (ML) techniques to show its relevance in the real-time scenario.
Near-memory & In-Memory Detection of Fileless MalwareMarcus Botacin
My keynote at the Brazilian Security Symposium (SBSeg), as part of the Computer Forensics Workshop (WFC), talking about fileless malware, the challenges for antivirus detection, and new detection strategies. I present the prototype of a hardware AV with integrated signature matching to decrease the performance penalty imposed by software-only AVs.
Similar to Challenges in High Accuracy of Malware Detection (20)
Near-memory & In-Memory Detection of Fileless Malware
Challenges in High Accuracy of Malware Detection
1. Intro
Issues
Objectives
Methodology
Conclusion
Challenges in High Accuracy of
Malware Detection
Muhammad Najmi Ahmad Zabidi
International Islamic University Malaysia
IEEE Control & System Graduate Research Colloquium 2012
Shah Alam, Malaysia
16th July 2012
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 1/26
2. Intro
Issues
Objectives
Methodology
Conclusion
About
I am a research grad student at Universiti Teknologi
Malaysia, Skudai, Johor Bahru, Malaysia
My current employer is International Islamic University
Malaysia, Kuala Lumpur
Research area - malware detection, narrowing on
Windows executables
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 2/26
3. Intro
Issues
Objectives
Methodology
Conclusion
Malware in short
is a software
maliciousness is defined on the risks exposed to the user
sometimes, when in vague, the term ‘‘Potentially
Unwanted Program/Application’’ (PUP/PUA) being used
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 3/26
4. Intro
Issues
Objectives
Methodology
Conclusion
Methods of detections
Static analysis
In this case we have developed a Python based tool,
called as pi-ngaji, an open source tool for static malware
analysis
Dynamic analysis
In this case we will execute the malware in a Windows
environment and dump the API traces into a text file
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 4/26
5. Intro
Issues
Objectives
Methodology
Conclusion
This talk outline several challenges on the current methods of
malware detection
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 5/26
6. Intro
Issues
Objectives
Methodology
Conclusion
Analysis of strings
Important, although not foolproof
Find interesting calls first
Considered static analysis, since no executing of the
binary
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 6/26
7. Intro
Issues
Objectives
Methodology
Conclusion
Methods to find interesting strings
Use strings command (on *NIX systems)
Editors
Checking with Import Address Table (IAT)
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 7/26
8. Intro
Issues
Objectives
Methodology
Conclusion
Issues
Malware numbers are enormous
Need automation in handling the detection
Our proposal - use Machine Learning methods
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 8/26
9. Intro
Issues
Objectives
Methodology
Conclusion
Objectives
Reducing features in malware API since
Some are weak, irrelevant features
Considered as ‘‘noise’’
Feature selection, ranking method is chosen
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 9/26
10. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
The features
The following are the features
Application Programming Interface (API) calls
XOR’ed strings
Anti virtualization/virtual machine detector
Binary entropy is also interesting
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 10/26
11. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Binary file structure
Figure: Structure of a PE file[Pietrek, 1994]
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 11/26
12. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Figure: PE components, simplified
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 12/26
13. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
API calls
Features are as follows:
Example of Features
GetSystemTimeAsFileTime
SetUnhandledExceptionFilte
GetCurrentProces
TerminateProcess
LoadLibraryExW
GetVersionExW
GetProcAddress
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 13/26
14. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Anti Debugger/AntiVM strings
IsDebuggerPresent
VMCheck.dll
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 14/26
15. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
"Red Pill":"x0fx01x0dx00x00x00x00xc3",
"VirtualPc trick":"x0fx3fx07x0b",
"VMware trick":"VMXh",
"VMCheck.dll":"x45xC7x00x01",
"VMCheck.dll for VirtualPC":"x0fx3fx07x0bxc7x45xfcxffxffxffxff",
"Xen":"XenVMM", # Or XenVMMXenVMM
"Bochs & QEmu CPUID Trick":"x44x4dx41x63",
"Torpig VMM Trick": "xE8xEDxFFxFFxFFx25x00x00x00xFF
x33xC9x3Dx00x00x00x80x0Fx95xC1x8BxC1xC3",
"Torpig (UPX) VMM Trick": "x51x51x0Fx01x27x00xC1xFBxB5xD5x35
x02xE2xC3xD1x66x25x32
xBDx83x7FxB7x4Ex3Dx06x80x0Fx95xC1x8BxC1xC3"
Source: ZeroWine source code
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 15/26
16. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Sample execution
Analyzing e665297bf9dbb2b2790e4d898d70c9e9
Analyzing registry...
[+] Malware is Adding a Key at Hive: HKEY_LOCAL_MACHINE
^G^@Label11^@^A^AÃˇ^Nreg add "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersion
R
File Execution OptionsRx.exe" /v debugger /t REG_SZ /d %systemrot%repair1sass.exe /f^M
....
[+] Malware Seems to be IRC BOT: Verified By String : ADMIN
[+] Malware Seems to be IRC BOT: Verified By String : LIST
[+] Malware Seems to be IRC BOT: Verified By String : QUIT
[+] Malware Seems to be IRC BOT: Verified By String : VERSION
Analyzing interesting calls..
[+] Found an Interesting call to: FindWindow
[+] Found an Interesting call to: LoadLibraryA
[+] Found an Interesting call to: CreateProcess
[+] Found an Interesting call to: GetProcAddress
[+] Found an Interesting call to: CopyFile
[+] Found an Interesting call to: shdocvw
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 16/26
17. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Advantages on the researcher’s side
Malware writers usually are ‘‘lazy’’ hence there is a
tendency they will reuse the previous chunk of codes
Hence, it’s easier to trace the previous family based on
the commonalities
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 17/26
18. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Our methods
Roughly our methods consist of :
1 Feature Selection(Ranking/Pruning)
2 Supervised Classification
3 Unsupervised Classification
Item 2) and 3) above also could be combined to a method
known as ‘‘Semi Supervised Classification’’.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 18/26
19. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Information Gain
[Zhang et al., 2007, Altaher et al., 2011,
Singhal and Raul, 2012] use the following formula for IG
application in malware
The amount by which the entropy of X decreases
reflects additional information about X provided by Y is
called information gain, given by
IG(X |Y ) = H(X ) − H(X |Y )
[Singhal and Raul, 2012] introduced the following algorithm
to ‘‘correct out’’ error the results.
n
i−0 IG(Xi )
IG(X ) = IG(X ) ±
n
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 19/26
20. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Information Gain (cont’d)
From [Jiang et al., 2011]
P(t , c)
IG(t) = P(t , c)log
P(t )P(c)
c∈{ci ,ci } t ∈{t,t}
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 20/26
21. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
For research purpose the following issues are always
wondered:
No standard dataset, unlike Intrusion Detection System
(IDS) area
Fast-paced malware sample, will the datasets being used
for the experiment will be questioned
Last resort, stick to the existing database, try to free from
any specific malware family as to make sure the method
will/could work with incoming, new malware
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 21/26
22. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
23. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
24. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
25. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
26. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
27. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
28. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
29. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
30. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
Random Forest
Neural Networks
k-Nearest Neighbor
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
31. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Table: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes: Popular algorithms includes:
Random Forest K-means
Neural Networks Fuzzy C
k-Nearest Neighbor Gaussian
Decision Trees
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 22/26
32. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Classification (supervised) chosen to deal with known
corpus but incomplete data
Clustering (unsupervised) chosen to deal with new inputs
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 23/26
33. Intro API calls
Issues Anti Debugger/AntiVM strings
Objectives Feature Ranking Selection with Information Gain
Methodology Classification and Clustering
Conclusion
Some results
We managed to detect several malware samples by using
the existing API traces and other features (bot
commands, file/registry deletion)
New malware which is more sophisticated -
Stuxned/Duqu is very platform specific - attacking SCADA
system hence needs more reading on detecting them.
Perhaps the most obvious if any XOR’ed communication
channels being used.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 24/26
34. Intro
Issues
Objectives
Methodology
Conclusion
The flow
Feature Selection Feature Categorization
Weka, Octave/Matlab
Clustering Classification
Weka, Octave/Matlab
scipy, Octave/Matlab
Visualization
scipy, Octave/Matlab
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 25/26
35. Intro
Issues
Objectives
Methodology
Conclusion
Altaher, A., Ramadass, S., and Ali, A. (2011).
Computer Virus Detection Using Features Ranking and Machine Learning.
Australian Journal of Basic and Applied Sciences, 5(9):1482--1486.
Jiang, Q., Zhao, X., and Huang, K. (2011).
A feature selection method for malware detection.
In 2011 IEEE International Conference on Information and Automation (ICIA), pages 890--895.
Pietrek, M. (1994).
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.
http://msdn.microsoft.com/en-us/library/ms809762.aspx.
Singhal, P. and Raul, N. (2012).
Malware detection module using machine learning algorithms to assist in centralized security in enterprise
networks.
International Journal of Network Security & Its Applications, 4.
Zhang, B., Yin, J., Hao, J., Wang, S., and Zhang, D. (2007).
New malicious code detection based on n-gram analysis and rough set theory.
pages 626--633. Springer-Verlag, Berlin, Heidelberg.
Muhammad Najmi Ahmad Zabidi ICSRGC 2012 26/26