Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

•Download as PPTX, PDF•

2 likes•1,193 views

The document discusses techniques for detecting software similarity, including control flow graphs, birthmarks, and algorithms like q-grams and optimal distance. It also evaluates these techniques on malware samples, showing detection rates and false positives for different algorithms and similarity thresholds. Processing times for analyzing benign and malicious files are presented.

Technology Business

Program p Birthmark MATCH!

Similar?

Program q Birthmark Different

The software similarity problem.

$proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5 A control flow graph, its structured form, and its string representation .$

M1 S (P )
1

M2 S ( P2 )
M 1 ' {ai M 1} {b j } : 1 M1 j M2
M 2 ' {ai M 2 } {b j } : 1 M2 j M1
C : M1' M 2 ' R
a, if a M1, b M 2
C ( a, b) { b , if b M 2 , a M 2
ed (a, b), if a M 1 , b M 2

Find a bijection f:M1’M2’ such that the
distance, d is minimized.
d a M1 '
C (a, f (a))

•

•

•

•
d ( p, q )
p: p E, | 1 t , d ( p, q ) q
q

Samples Malware
Unknown New
From Signature
Database
Sample
Honeypots

From
Honeypot? New
Dynamic Analysis
No Signature

End of Static
Packed Yes Emulate Yes
Unpacking? Classify
Analysis

No
Non
Malicious
Malicious

The Malwise malware classification system .

Malware Detection Rates
Classification
False Positives Algorithm
Klez Netsky Roron Frethem

Maximum 36 49 81 289
Similarity K-Subgraphs Q-Grams
Exact 20 29 17 139
0.0 1302161 2334251 Heuristic
Approximate 20 27 43 144
0.1 463170 413667 Q-Grams 20 31 79 226
0.2 356345 40055 Optimal Distance 22 46 73 220
Q-Grams +
0.3 285202 7899 Optimal Distance 20 43 73 217
0.4 200326 3790
0.5 129790 327 False Positives with 10,000
0.6 46320 11 Malware
0.7 10784 0 Classification False FP
Algorithm Positives Percentage
0.8 5883 0
Q-Grams 10 0.62
0.9 19 0
Q-Grams + Optimal
1.0 0 0 Distance 7 0.43

ao b d e g k m q a ao b d e g k m q a
ao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75
b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87
d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29
e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33
g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30
k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96
m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87
q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87
a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87

Exact Matching Heuristic Approximate Matching
ao b d e g k m q a ao b d e g k m q a
ao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86
b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96
d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64
e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68
g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68
k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99
m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96
q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96
a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96

Q-Grams Optimal Distance Using
Assignment Problem

•
•
Benign and Malicious
Processing Time
Benign Malware
% Samples
Time(s) Time(s)
10 0.02 0.16
20 0.02 0.28
30 0.03 0.30
40 0.03 0.36
50 0.06 0.84
60 0.09 0.94
70 0.13 0.97
80 0.25 1.03
90 0.56 1.31
100 8.06 585.16

Silvio Cesare presented an effective approach for flowgraph-based malware variant detection. The approach transforms control flow graphs into strings that are then compared using an assignment problem dissimilarity metric for sets of strings. Evaluation on Roron malware variants showed the approach was more effective at detecting variants than previous exact matching approaches. The system was also shown to have low false positive rates and efficient processing times for malware detection. The techniques developed have also been applied to other software analysis tools for similarity detection, bug finding, and more.

Ruby on Rails Security Updated (Rails 3) at RailsWayCon

heikowebers

Introduction to Malware

amiable_indian

Malware can take many forms such as viruses, worms, trojan horses, adware, and spyware. Viruses and worms are programs that can copy themselves and spread from computer to computer, sometimes causing harm. Adware displays advertisements, and spyware tracks personal information without consent. Phishing scams try to steal personal details through fraudulent emails or websites. Users should use antivirus software, avoid suspicious emails/attachments, and practice safe password habits to protect against malware threats.

A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING

Silvio Cesare

This document provides an introduction to hardware hacking for software engineers. It outlines several beginner hardware hacking projects, including interfacing with UART to gain serial console access on devices, ripping firmware from chips to analyze code and find passwords/strings, manipulating IR alarm systems by learning codes and repurposing remotes, and building an Arduino-controlled backyard irrigation system networked to a PC. The document explains how to identify important chips, interfaces, and voltages, and techniques for reading serial flash and desoldering chips to extract firmware. It presents hardware hacking as an accessible new hobby that can build skills in electronics and low-level programming.

A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS

Silvio Cesare

This document provides an overview of various academic techniques that can be useful for security researchers, including mathematical objects, comparing objects, similarity searching, classification, clustering, and program analysis. It discusses representing problems as different objects like strings, vectors, and graphs and using techniques like n-grams, vector distances, and graph decomposition. Case studies of projects that applied these techniques are also summarized.

Simseer.com - Malware Similarity and Clustering Made Easy

Silvio Cesare

Simseer.com provides free web services to analyze malware using program structure as a signature. The services include Simseer, which compares malware samples and visualizes relationships; SimseerCluster, which groups samples into clusters identifying potential families; and SimseerSearch, which finds similar samples to a query. The services leverage control flow graph signatures and machine learning to provide robust malware analysis without traditional string signatures.

Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...

Silvio Cesare

This document describes two web services called Simseer and Bugwise for software defect detection and similarity analysis. Simseer performs malware variant and plagiarism detection by generating control flow signatures and comparing similarities. Bugwise detects bugs like double frees through decompilation and data flow analysis. The services are implemented through a PHP frontend and C++ backend called Malwise that performs analysis through plugins. Initial results found the web services had minimal overhead compared to command line usage.

FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...

Silvio Cesare

Bugwise is a tool that detects bugs in binaries using decompilation and data flow analysis. It detects issues like use-after-free bugs, double free bugs, and unsafe calls to getenv(). It has scanned over 123,000 Debian binaries and reported 85 getenv() related bugs across 47 packages. The probability of a binary having a vulnerability is 0.00067, and the probability of a package having at least one vulnerable binary is 0.00255. Bugwise is based on strong theoretical underpinnings like data flow analysis and is extensible to detect more bug classes. The presenter aims to make more of their research public and get more people using their tools via their website.

The document discusses using static analysis techniques like data flow analysis and decompilation to detect bugs in binary files. It describes decompiling binaries into an intermediate representation and then performing intraprocedural and interprocedural data flow analysis on the representation. This allows detecting bugs involving unsafe functions like getenv() and memory issues like use-after-free and double free errors. The approach involves lifting x86 into a RISC-like intermediate language, inferring stack pointers, and decompiling locals and arguments to perform analysis and optimization.

Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...

Silvio Cesare

This document discusses automatically detecting package clones and inferring security vulnerabilities. It proposes using statistical classification techniques to identify cloned code between software packages. Features like common filenames, hashes, and fuzzy content would be used for classification. Packages found to share code could then be checked against known vulnerabilities to see if any vulnerabilities may affect the cloned code. The approach aims to scale the analysis to thousands of packages and help identify vulnerabilities in packages with cloned code that may not otherwise be tracked.

Wire - A Formal Intermediate Language for Binary Analysis

Silvio Cesare

Simseer - A Software Similarity Web Service

Silvio Cesare

This document summarizes an overview talk on software similarity. It introduces the speaker and their research focus on malware detection and vulnerability detection. It then provides an overview of the core topics of software similarity, how it is approached in academia, and introduces a new web service that identifies software similarity. It discusses how software similarity can be used for malware detection, software theft detection, plagiarism detection, and software clone detection. It also provides taxonomy of different program features that can be analyzed and examples of how features like ASTs and control flow can be represented. Finally, it introduces resources like a wiki, book, and new web service called Simseer for software similarity.

Faster, More Effective Flowgraph-based Malware Classification

Silvio Cesare

Silvio Cesare is a PhD candidate at Deakin University researching malware detection and automated vulnerability discovery. His current work extends his Masters research on fast automated unpacking and classification of malware. He presented this work last year at Ruxcon 2010. His system uses control flow graphs and q-grams of decompiled code as "birthmarks" to detect unknown malware samples that are suspiciously similar to known malware, reducing the need for signatures. He evaluated the system on 10,000 malware samples with only 10 false positives. The system provides improved effectiveness and efficiency over his previous work in 2010.

Automated Detection of Software Bugs and Vulnerabilities in Linux

Silvio Cesare

This document summarizes the key points from a technical presentation about detecting software defects and vulnerabilities. It identifies that the presenter is a PhD student researching malware classification and bug detection. Their approach involves combining decompilation with static analysis to find bugs in Linux binaries. They have found previously unknown bugs and vulnerabilities. Their ongoing work aims to automatically identify embedded third-party packages within software distributions in order to detect shared vulnerabilities.

Simple Bugs and Vulnerabilities in Linux Distributions

Silvio Cesare

This talk discusses automated techniques for finding bugs and vulnerabilities in Linux software packages. The techniques were able to find: - 27+ bug reports submitted to Debian after scanning for memset function bugs - 741 programs that crashed when passed a null argv[0] parameter in Debian (27% crash rate) - 3 segmentation faults when fuzzing most SUID/SGID programs in Debian - 16 vulnerabilities found in Debian packages and 15 in Fedora packages after scanning for signatures of embedded vulnerable libraries Linux distributions are using the results to improve security testing and patch vulnerabilities.

Fast Automated Unpacking and Classification of Malware

Silvio Cesare

This document summarizes Silvio Cesare's research presentation on fast automated unpacking and classification of malware. The research aims to efficiently and effectively detect and classify malware using static analysis. It involves developing an automated unpacker using emulation and entropy analysis to unpack malware. It then extracts control flow graphs from unpacked malware and uses graph matching techniques to classify malware and identify variants by similarity. The techniques were evaluated on real malware samples and shown to accurately unpack and classify malware with low processing times suitable for real-time systems.

Malware Classification Using Structured Control Flow

Silvio Cesare

This document summarizes a system for classifying malware using control flow graph signatures. It discusses: 1) Using entropy analysis to identify and unpack packed malware through application-level emulation. 2) Generating control flow graph signatures using a "structuring" technique and calculating similarities to signatures in a malware database. 3) Evaluating the system on real malware, showing high similarities between variants and low similarities between unrelated programs.

A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...

Silvio Cesare

We propose an algorithm to identify malware variants by determining program similarity through estimating isomorphic control flow graphs. We implement this approach in a prototype system that demonstrates its ability to detect real malware variants with low false positives and logarithmic performance scalability, making it suitable for endhost adoption. Control flow graphs provide a more invariant characteristic than traditional static features like byte sequences for identifying polymorphic malware variants. Our system generates signatures for control flow graphs to efficiently compare programs and classify unknown samples.

Security Applications For Emulation

Silvio Cesare

The document discusses using emulation for security applications like reverse engineering Cisco IOS's heap management, tracing program execution to evaluate binaries, implementing dynamic taint analysis, and developing automated unpacking tools. It describes how emulation allows intercepting program execution at the instruction level and adding instrumentation to perform these dynamic analyses, avoiding detection by anti-debugging techniques. Specific tools mentioned include Dynamips, TTAnalyze, Argos, Pandora's Bochs, and the author's own unpacker and emulator.

Auditing the Opensource Kernels

Silvio Cesare

The document provides an overview of a presentation on kernel auditing research, including: - Three parts to the presentation covering kernel auditing research, exploitable bugs found, and kernel exploitation. - Audits were conducted on several open source kernels, finding over 100 vulnerabilities across them. - A sample of exploitable bugs is then presented from the audited kernels to provide evidence that kernels are not bug-free and vulnerabilities can be relatively simple to find and exploit.

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Zilliz

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Zilliz

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

名前です男

UiPath Test Automation using UiPath Test Suite series, part 6

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI. UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities. Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes. What will you get from this session? 1. Insights into integrating generative AI. 2. Understanding how this integration enhances test automation within the UiPath platform 3. Practical demonstrations 4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath Topics covered: What is generative AI Test Automation with generative AI and Open AI. UiPath integration with generative AI Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

RESUME BUILDER APPLICATION Project for students

KAMESHS29

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Speck&Tech

ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune. Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile. BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website

20240607 QFM018 Elixir Reading List May 2024

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

TrustArc Webinar - 2024 Global Privacy Survey

PCI PIN Basics Webinar from the Controlcase Team

Introduction to CHERI technology - Cybersecurity

Securing your Kubernetes cluster_ a step-by-step guide to success !

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Essentials of Automations: The Art of Triggers and Actions in FME

UiPath Test Automation using UiPath Test Suite series, part 6

Full-RAG: A modern architecture for hyper-personalization

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Removing Uninteresting Bytes in Software Fuzzing

RESUME BUILDER APPLICATION Project for students

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Climate Impact of Software Testing at Nordic Testing Days

How to use Firebase Data Connect For Flutter

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

7. Program p Birthmark MATCH! Similar? Program q Birthmark Different The software similarity problem.

8. • • • •

9. proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5 A control flow graph, its structured form, and its string representation .

10. • • • • •

11. • • • •

12. • • • n d1 ( p, q ) p q1 pi qi i 1

13. • • • d (r , q) R {r D} | 1 t q

14. • • • • • • •

15. M1 S (P ) 1 M2 S ( P2 ) M 1 ' {ai M 1} {b j } : 1 M1 j M2 M 2 ' {ai M 2 } {b j } : 1 M2 j M1 C : M1' M 2 ' R a, if a M1, b M 2 C ( a, b) { b , if b M 2 , a M 2 ed (a, b), if a M 1 , b M 2 Find a bijection f:M1’M2’ such that the distance, d is minimized. d a M1 ' C (a, f (a))

16. • • • • d ( p, q ) p: p E, | 1 t , d ( p, q ) q q

17. • • •

18. Samples Malware Unknown New From Signature Database Sample Honeypots From Honeypot? New Dynamic Analysis No Signature End of Static Packed Yes Emulate Yes Unpacking? Classify Analysis No Non Malicious Malicious The Malwise malware classification system .

19. • • • • •

20. Malware Detection Rates Classification False Positives Algorithm Klez Netsky Roron Frethem Maximum 36 49 81 289 Similarity K-Subgraphs Q-Grams Exact 20 29 17 139 0.0 1302161 2334251 Heuristic Approximate 20 27 43 144 0.1 463170 413667 Q-Grams 20 31 79 226 0.2 356345 40055 Optimal Distance 22 46 73 220 Q-Grams + 0.3 285202 7899 Optimal Distance 20 43 73 217 0.4 200326 3790 0.5 129790 327 False Positives with 10,000 0.6 46320 11 Malware 0.7 10784 0 Classification False FP Algorithm Positives Percentage 0.8 5883 0 Q-Grams 10 0.62 0.9 19 0 Q-Grams + Optimal 1.0 0 0 Distance 7 0.43

21. ao b d e g k m q a ao b d e g k m q a ao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75 b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87 d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29 e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33 g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30 k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96 m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87 Exact Matching Heuristic Approximate Matching ao b d e g k m q a ao b d e g k m q a ao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86 b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96 d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64 e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68 g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68 k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99 m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96 Q-Grams Optimal Distance Using Assignment Problem

22. • • Benign and Malicious Processing Time Benign Malware % Samples Time(s) Time(s) 10 0.02 0.16 20 0.02 0.28 30 0.03 0.30 40 0.03 0.36 50 0.06 0.84 60 0.09 0.94 70 0.13 0.97 80 0.25 1.03 90 0.56 1.31 100 8.06 585.16

23. • • • • • •

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

Recommended

Recommended

More Related Content

More from Silvio Cesare

More from Silvio Cesare (12)

Recently uploaded

Recently uploaded (20)

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs