Machine Learning for Malware Detection: Beyond Accuracy Rates

•

0 likes•99 views

Research work of my student Lucas Galante, presented at SBSEG2019. We discuss the implications of adopting distinct machine learning models for malware detection.

Motivation Methodology Evaluation Conclusion
Machine Learning for Malware Detection:
Beyond Accuracy Rates
Lucas Galante, Marcus Botacin, Andr´e Gr´egio, Paulo L´ıcio de
Geus
SBSEG 2019
Machine Learning for Malware Detection: Beyond Accuracy Rates 1 / 17

Motivation Methodology Evaluation Conclusion
Agenda
1 Motivation
Motivation
2 Methodology
Malware Classiﬁer
3 Evaluation
Beyond Accuracy Rates
4 Conclusion
Conclusion
Machine Learning for Malware Detection: Beyond Accuracy Rates 2 / 17

Motivation Methodology Evaluation Conclusion
Motivation
Agenda
1 Motivation
Motivation
2 Methodology
Malware Classiﬁer
3 Evaluation
Beyond Accuracy Rates
4 Conclusion
Conclusion
Machine Learning for Malware Detection: Beyond Accuracy Rates 3 / 17

Motivation Methodology Evaluation Conclusion
Motivation
Malware Increase
Figure: Increase of 46% in malware activity in Q2 of 2019.
https://tinyurl.com/y6qzn83h
Machine Learning for Malware Detection: Beyond Accuracy Rates 4 / 17

Motivation Methodology Evaluation Conclusion
Motivation
Malware Classiﬁcation
Figure: Necessity of antimalware applications.
https://tinyurl.com/y2bz5k58
Machine Learning for Malware Detection: Beyond Accuracy Rates 5 / 17

Motivation Methodology Evaluation Conclusion
Malware Classiﬁer
Agenda
1 Motivation
Motivation
2 Methodology
Malware Classiﬁer
3 Evaluation
Beyond Accuracy Rates
4 Conclusion
Conclusion
Machine Learning for Malware Detection: Beyond Accuracy Rates 6 / 17

Motivation Methodology Evaluation Conclusion
Malware Classiﬁer
Extracted Features
Table: Malware Features classiﬁed according extraction method (static
and dynamic) and representation (discrete or continuous).
Static Dynamic
Discrete Continuous Both
Embedded ﬁles Dissasembly fail Size sections # headers fork syscall /proc access
/home string ptrace syscall /home string # .dynamic ptrace syscall /home access
/sys string Network strings /sys string # sections socket syscall passwd access
Linkage Header present passwd string # symbols mmap syscal permission denied
UPX passwd string # libs # relocations SIGTERM
fork syscall compiler string Size sample # debug section SIGSEGV
Machine Learning for Malware Detection: Beyond Accuracy Rates 7 / 17

Motivation Methodology Evaluation Conclusion
Malware Classiﬁer
Classiﬁcation Overview
Figure: Overview of classiﬁcation process.
Machine Learning for Malware Detection: Beyond Accuracy Rates 8 / 17

Motivation Methodology Evaluation Conclusion
Beyond Accuracy Rates
Agenda
1 Motivation
Motivation
2 Methodology
Malware Classiﬁer
3 Evaluation
Beyond Accuracy Rates
4 Conclusion
Conclusion
Machine Learning for Malware Detection: Beyond Accuracy Rates 9 / 17

Motivation Methodology Evaluation Conclusion
Beyond Accuracy Rates
Importance of a Good Feature Extraction Procedure
SVM classiﬁcation of static continuous features.
Kernel/Iter(#) 1000 10000 100000
Poly 49.32% 49.74% 49.95%
Linear 73.87% 77.64% 80.94%
rbf 84.92% 84.92% 84.92%
SVM classiﬁcation of dynamic continuous features.
Kernel/ Iter (#) 1000 10000 100000
Poly 49.92% 49.76% 50.71%
Linear 93.73% 86.51% 86.73%
rbf 92.63% 92.63% 92.63%
Machine Learning for Malware Detection: Beyond Accuracy Rates 10 / 17

Motivation Methodology Evaluation Conclusion
Beyond Accuracy Rates
Importance of Evaluated Datasets
Mixed dataset. Random Forest classiﬁcation of static
continuous features.
Max Depth/ Estimators (#) 16 32 64
8 99.17% 99.06% 99.20%
16 99.13% 99.06% 99.09%
32 99.09% 99.13% 99.17%
VirusTotal dataset. Random Forest classiﬁcation of static
continuous features.
Max Depth/ Estimators (#) 16 32 64
8 94.29% 94.35% 94.24%
16 94.24% 94.14% 94.08%
32 94.08% 94.14% 94.19%
Machine Learning for Malware Detection: Beyond Accuracy Rates 11 / 17

Motivation Methodology Evaluation Conclusion
Beyond Accuracy Rates
Analyst Importance
SVM classiﬁcation of dynamic continuous features.
Kernel/ Iter (#) 1000 10000 100000
Poly 50.91% 54.05% 58.16%
Linear 97.97% 97.56% 80.35%
rbf 98.54% 98.54% 98.54%
SVM classiﬁcation of dynamic discrete features.
Kernel/ Iter (#) 1000 10000 100000
Poly 79.68% 79.91% 79.91%
Linear 96.48% 96.48% 96.48%
rbf 96.35% 96.35% 96.35%
Machine Learning for Malware Detection: Beyond Accuracy Rates 12 / 17

Motivation Methodology Evaluation Conclusion
Beyond Accuracy Rates
What ML results teach us
Static feature importance
Static
Discrete Continuous
Network strings 40% Binary size 27%
UPX present 17% # headers 16.70%
passwd strings 1.40% # debug sections 0.20%
Dynamic feature importance
Dynamic
Discrete Continuous
mmap 50% # mmap 68%
fork 6% # fork 10.80%
SIGSEGV 10.60% # SIGSEGV 1.30%
Machine Learning for Malware Detection: Beyond Accuracy Rates 13 / 17

Motivation Methodology Evaluation Conclusion
Conclusion
Agenda
1 Motivation
Motivation
2 Methodology
Malware Classiﬁer
3 Evaluation
Beyond Accuracy Rates
4 Conclusion
Conclusion
Machine Learning for Malware Detection: Beyond Accuracy Rates 14 / 17

Motivation Methodology Evaluation Conclusion
Conclusion
Conclusion
Our results show that:
Dynamic features outperforms static features
Discrete features present smaller accuracy variance
Dataset’s distinct characteristics impose challenges to ML
models
Feature analysis can be used as feedback information
Machine Learning for Malware Detection: Beyond Accuracy Rates 15 / 17

Motivation Methodology Evaluation Conclusion
Conclusion
Acknowledgement
This work is supported by:
Brazilian National Counsel of Technological and Scientiﬁc
Development
CESeg assistance
Machine Learning for Malware Detection: Beyond Accuracy Rates 16 / 17

Motivation Methodology Evaluation Conclusion
Conclusion
Questions, Critics and Suggestions.
Contact
galante@lasca.ic.unicamp.br
Complete version
https://github.com/marcusbotacin/ELF.Classifier
Previous work
https://github.com/marcusbotacin/Linux.Malware
Reverse Engineering Workshop
Thursday @ 13:30
Machine Learning for Malware Detection: Beyond Accuracy Rates 17 / 17

Slide present statistical mining of Malicious-Executable dataset collected from various antivirus log-files and other sources. Further classifications of malicious code as per their impact on user's system & distinguishes threats on the muse in their connected severity. Implementation of the algorithms JRIP ,PART and RIDOR in additional economical manner to acquire a level of accuracy to the classification results.

IRJET - Survey on Malware Detection using Deep Learning Methods

IRJET Journal

Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...

IJCSIS Research Publications

Today’s threats have become very complex and serious in their packing and encryption techniques. Every day new malware variants are becoming increasingly in quantity together with quality by using packing and encrypting techniques. The challenges in this research field are the traditional malware detection systems sometimes might fail to detect new malware variants and produces false alarms. Malicious software in the form of virus, worm, trojan, ransom, and spy harms our computer systems, network environment, and organizations in various ways. Therefore, malware analysis for detection and family classification plays a significant role in Cyber Crime Incident Handling Systems. This system contributes malware family classification with 10 prominent features by conduction feature selection process. The process of labeling the malicious samples using Regular Expressions has been contributed in this approach. The proposed malware classification system provides 7 different families including malware and benign using machine learning classifiers. The finding from our experiment proves that the selected 10 API features provide the best evaluation metrics in terms of accuracy, precision-recall, and ROC scores.

IRJET- Effective Technique Used for Malware Detection using Machine Learning

IRJET Journal

A Tale of Experiments on Bug Prediction

Martin Pinzger

MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE

IJNSA Journal

In the era of information technology and connected world, detecting malware has been a major security concern for individuals, companies and even for states. The New generation of malware samples upgraded with advanced protection mechanism such as packing, and obfuscation frustrate anti-virus solutions. API call analysis is used to identify suspicious malicious behavior thanks to its description capability of a software functionality. In this paper, we propose an effective and efficient malware detection method that uses sequential pattern mining algorithm to discover representative and discriminative API call patterns. Then, we apply three machine learning algorithms to classify malware samples. Based on the experimental results, the proposed method assures favorable results with 0.999 F-measure on a dataset including 8152 malware samples belonging to 16 families and 523 benign samples.

MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE

IJNSA Journal

IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND

Rabi Das

Automated Android Malware Detection Using Optimal Ensemble Learning Approach for Cybersecurity. Shakas Technologies ( Galaxy of Knowledge) #11/A 2nd East Main Road, Gandhi Nagar, Vellore - 632006. Mobile : +91-9500218218 / 8220150373| land line- 0416- 3552723 Shakas Training & Development | Shakas Sales & Services | Shakas Educational Trust|IEEE projects | Research & Development | Journal Publication | Email : info@shakastech.com | shakastech@gmail.com | website: www.shakastech.com Facebook: https://www.facebook.com/pages/Shakas-Technologies

Antimalware

Mayank Chaudhari

Near-memory & In-Memory Detection of Fileless Malware

Marcus Botacin

PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...

IRJET Journal

Malware Detection Using Machine Learning Techniques

ArshadRaja786

Pindroid - Android Malware Detection Tool

Akhil Goyal

Criminal Identification using Arm7

IRJET Journal

Injection Attack detection using ML for

Khazane Hassan

Network intrusion detection using supervised machine learning technique with ...

CloudTechnologies

IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices

IRJET Journal

COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...

IJNSA Journal

Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used a 1-Dimensional CNN by mapping API calls as categorical and term frequency-inverse document frequency (TF-IDF) vectors and compared the results to other classification techniques.The proposed 1-D CNN outperformed other classification techniques with 91% overall accuracy for both categorical and TF-IDF vectors.

Local Descriptor based Face Recognition System

IRJET Journal

IRJET- Android Malware Detection using Deep Learning

IRJET Journal

A Comparative Study for Credit Card Fraud Detection System using Machine Lear...

IRJET Journal

Botnet detection using Wgans for security

ssuser3f5a831

Machine Learning by Examples - Marcus Botacin - TAMU 2024

Marcus Botacin

Near-memory & In-Memory Detection of Fileless Malware

Marcus Botacin

Similar to Machine Learning for Malware Detection: Beyond Accuracy Rates

BH-US-06-Bilar.pdf

MohammadRazavi17

A survey of fault prediction using machine learning algorithms

Ahmed Magdy Ezzeldin, MSc.

Malware Detection Using Data Mining Techniques

Akash Karwande

Malicious Linux binaries: A Landscape

Marcus Botacin

Measuring the Code Quality Using Software Metrics

Geetha Anjali

Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...

Shakas Technologies

Antimalware

Mayank Chaudhari

Near-memory & In-Memory Detection of Fileless Malware

Marcus Botacin

PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...

IRJET Journal

Malware Detection Using Machine Learning Techniques

ArshadRaja786

Pindroid - Android Malware Detection Tool

Akhil Goyal

Criminal Identification using Arm7

IRJET Journal

Injection Attack detection using ML for

Khazane Hassan

Network intrusion detection using supervised machine learning technique with ...

CloudTechnologies

IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices

IRJET Journal

COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...

IJNSA Journal

Local Descriptor based Face Recognition System

IRJET Journal

IRJET- Android Malware Detection using Deep Learning

IRJET Journal

A Comparative Study for Credit Card Fraud Detection System using Machine Lear...

IRJET Journal

Botnet detection using Wgans for security

ssuser3f5a831

Similar to Machine Learning for Malware Detection: Beyond Accuracy Rates (20)

BH-US-06-Bilar.pdf

A survey of fault prediction using machine learning algorithms

Malware Detection Using Data Mining Techniques

Malicious Linux binaries: A Landscape

Measuring the Code Quality Using Software Metrics

Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...

Antimalware

Near-memory & In-Memory Detection of Fileless Malware

PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...

Malware Detection Using Machine Learning Techniques

Pindroid - Android Malware Detection Tool

Criminal Identification using Arm7

Injection Attack detection using ML for

Network intrusion detection using supervised machine learning technique with ...

IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices

COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...

Local Descriptor based Face Recognition System

IRJET- Android Malware Detection using Deep Learning

A Comparative Study for Credit Card Fraud Detection System using Machine Lear...

Botnet detection using Wgans for security

Recently uploaded

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

The Future of Platform Engineering

Jemma Hussein Allen

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Recently uploaded (20)

Bits & Pixels using AI for Good.........

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

GraphRAG is All You need? LLM & Knowledge Graph

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

UiPath Test Automation using UiPath Test Suite series, part 4

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Accelerate your Kubernetes clusters with Varnish Caching

The Future of Platform Engineering

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

How world-class product teams are winning in the AI era by CEO and Founder, P...

Epistemic Interaction - tuning interfaces to provide information for AI support

Key Trends Shaping the Future of Infrastructure.pdf

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Monitoring Java Application Security with JDK Tools and JFR Events

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Machine Learning for Malware Detection: Beyond Accuracy Rates

1. Motivation Methodology Evaluation Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates Lucas Galante, Marcus Botacin, Andr´e Gr´egio, Paulo L´ıcio de Geus SBSEG 2019 Machine Learning for Malware Detection: Beyond Accuracy Rates 1 / 17

2. Motivation Methodology Evaluation Conclusion Agenda 1 Motivation Motivation 2 Methodology Malware Classiﬁer 3 Evaluation Beyond Accuracy Rates 4 Conclusion Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates 2 / 17

3. Motivation Methodology Evaluation Conclusion Motivation Agenda 1 Motivation Motivation 2 Methodology Malware Classiﬁer 3 Evaluation Beyond Accuracy Rates 4 Conclusion Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates 3 / 17

4. Motivation Methodology Evaluation Conclusion Motivation Malware Increase Figure: Increase of 46% in malware activity in Q2 of 2019. https://tinyurl.com/y6qzn83h Machine Learning for Malware Detection: Beyond Accuracy Rates 4 / 17

5. Motivation Methodology Evaluation Conclusion Motivation Malware Classiﬁcation Figure: Necessity of antimalware applications. https://tinyurl.com/y2bz5k58 Machine Learning for Malware Detection: Beyond Accuracy Rates 5 / 17

6. Motivation Methodology Evaluation Conclusion Malware Classiﬁer Agenda 1 Motivation Motivation 2 Methodology Malware Classiﬁer 3 Evaluation Beyond Accuracy Rates 4 Conclusion Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates 6 / 17

7. Motivation Methodology Evaluation Conclusion Malware Classifier Extracted Features Table: Malware Features classified according extraction method (static and dynamic) and representation (discrete or continuous). Static Dynamic Discrete Continuous Both Embedded files Dissasembly fail Size sections # headers fork syscall /proc access /home string ptrace syscall /home string # .dynamic ptrace syscall /home access /sys string Network strings /sys string # sections socket syscall passwd access Linkage Header present passwd string # symbols mmap syscal permission denied UPX passwd string # libs # relocations SIGTERM fork syscall compiler string Size sample # debug section SIGSEGV Machine Learning for Malware Detection: Beyond Accuracy Rates 7 / 17

8. Motivation Methodology Evaluation Conclusion Malware Classifier Classification Overview Figure: Overview of classification process. Machine Learning for Malware Detection: Beyond Accuracy Rates 8 / 17

9. Motivation Methodology Evaluation Conclusion Beyond Accuracy Rates Agenda 1 Motivation Motivation 2 Methodology Malware Classiﬁer 3 Evaluation Beyond Accuracy Rates 4 Conclusion Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates 9 / 17

10. Motivation Methodology Evaluation Conclusion Beyond Accuracy Rates Importance of a Good Feature Extraction Procedure SVM classiﬁcation of static continuous features. Kernel/Iter(#) 1000 10000 100000 Poly 49.32% 49.74% 49.95% Linear 73.87% 77.64% 80.94% rbf 84.92% 84.92% 84.92% SVM classiﬁcation of dynamic continuous features. Kernel/ Iter (#) 1000 10000 100000 Poly 49.92% 49.76% 50.71% Linear 93.73% 86.51% 86.73% rbf 92.63% 92.63% 92.63% Machine Learning for Malware Detection: Beyond Accuracy Rates 10 / 17

11. Motivation Methodology Evaluation Conclusion Beyond Accuracy Rates Importance of Evaluated Datasets Mixed dataset. Random Forest classiﬁcation of static continuous features. Max Depth/ Estimators (#) 16 32 64 8 99.17% 99.06% 99.20% 16 99.13% 99.06% 99.09% 32 99.09% 99.13% 99.17% VirusTotal dataset. Random Forest classiﬁcation of static continuous features. Max Depth/ Estimators (#) 16 32 64 8 94.29% 94.35% 94.24% 16 94.24% 94.14% 94.08% 32 94.08% 94.14% 94.19% Machine Learning for Malware Detection: Beyond Accuracy Rates 11 / 17

12. Motivation Methodology Evaluation Conclusion Beyond Accuracy Rates Analyst Importance SVM classiﬁcation of dynamic continuous features. Kernel/ Iter (#) 1000 10000 100000 Poly 50.91% 54.05% 58.16% Linear 97.97% 97.56% 80.35% rbf 98.54% 98.54% 98.54% SVM classiﬁcation of dynamic discrete features. Kernel/ Iter (#) 1000 10000 100000 Poly 79.68% 79.91% 79.91% Linear 96.48% 96.48% 96.48% rbf 96.35% 96.35% 96.35% Machine Learning for Malware Detection: Beyond Accuracy Rates 12 / 17

13. Motivation Methodology Evaluation Conclusion Beyond Accuracy Rates What ML results teach us Static feature importance Static Discrete Continuous Network strings 40% Binary size 27% UPX present 17% # headers 16.70% passwd strings 1.40% # debug sections 0.20% Dynamic feature importance Dynamic Discrete Continuous mmap 50% # mmap 68% fork 6% # fork 10.80% SIGSEGV 10.60% # SIGSEGV 1.30% Machine Learning for Malware Detection: Beyond Accuracy Rates 13 / 17

14. Motivation Methodology Evaluation Conclusion Conclusion Agenda 1 Motivation Motivation 2 Methodology Malware Classiﬁer 3 Evaluation Beyond Accuracy Rates 4 Conclusion Conclusion Machine Learning for Malware Detection: Beyond Accuracy Rates 14 / 17

15. Motivation Methodology Evaluation Conclusion Conclusion Conclusion Our results show that: Dynamic features outperforms static features Discrete features present smaller accuracy variance Dataset’s distinct characteristics impose challenges to ML models Feature analysis can be used as feedback information Machine Learning for Malware Detection: Beyond Accuracy Rates 15 / 17

16. Motivation Methodology Evaluation Conclusion Conclusion Acknowledgement This work is supported by: Brazilian National Counsel of Technological and Scientiﬁc Development CESeg assistance Machine Learning for Malware Detection: Beyond Accuracy Rates 16 / 17

17. Motivation Methodology Evaluation Conclusion Conclusion Questions, Critics and Suggestions. Contact galante@lasca.ic.unicamp.br Complete version https://github.com/marcusbotacin/ELF.Classifier Previous work https://github.com/marcusbotacin/Linux.Malware Reverse Engineering Workshop Thursday @ 13:30 Machine Learning for Malware Detection: Beyond Accuracy Rates 17 / 17

Machine Learning for Malware Detection: Beyond Accuracy Rates

Recommended

Recommended

More Related Content

Similar to Machine Learning for Malware Detection: Beyond Accuracy Rates

Similar to Machine Learning for Malware Detection: Beyond Accuracy Rates (20)

More from Marcus Botacin

More from Marcus Botacin (20)

Recently uploaded

Recently uploaded (20)

Machine Learning for Malware Detection: Beyond Accuracy Rates