Silvio Cesare is a PhD candidate at Deakin University researching malware detection and automated vulnerability discovery. His current work extends his Masters research on fast automated unpacking and classification of malware. He presented this work last year at Ruxcon 2010. His system uses control flow graphs and q-grams of decompiled code as "birthmarks" to detect unknown malware samples that are suspiciously similar to known malware, reducing the need for signatures. He evaluated the system on 10,000 malware samples with only 10 false positives. The system provides improved effectiveness and efficiency over his previous work in 2010.
Technical presentation of the gesture based NUI I developed for the Aigaio smart conference room in IIT Demokritos
Demo In Greek:
https://www.youtube.com/watch?v=5C_p7MHKA4g
Slides I used in a tutorial on clustering methods with R at an INDUS research network meeting on the 8th October, 2015 (https://sites.google.com/site/indusnetzwerk/events/tagung-berlin). R codes are available at http://rpubs.com/mrkm_a/ClusteringMethodsWithR
Slides of my presentation at WBS 2023 on how to correctly plug AENs into a (V)AR if one is desperate about it. Note that all previous authors applied AENs to a sample, even though they looked at, essentially, a time series.
Main conclusions are
non-linear modelling does not add much value to (V)AR, either in AR part or in reducing dimension of the vector of innovations via AEN.
VAR did not yet work because existing implementations are bad. Should show better results at Quant Minds.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
The Vortex Search (VS) algorithm is one of the recently proposed metaheuristic algorithms which was
inspired from the vortical flow of the stirred fluids. Although the VS algorithm is shown to be a good
candidate for the solution of certain optimization problems, it also has some drawbacks. In the VS
algorithm, candidate solutions are generated around the current best solution by using a Gaussian
distribution at each iteration pass. This provides simplicity to the algorithm but it also leads to some
problems along. Especially, for the functions those have a number of local minimum points, to select a
single point to generate candidate solutions leads the algorithm to being trapped into a local minimum
point. Due to the adaptive step-size adjustment scheme used in the VS algorithm, the locality of the created
candidate solutions is increased at each iteration pass. Therefore, if the algorithm cannot escape a local
point as quickly as possible, it becomes much more difficult for the algorithm to escape from that point in
the latter iterations. In this study, a modified Vortex Search algorithm (MVS) is proposed to overcome
above mentioned drawback of the existing VS algorithm. In the MVS algorithm, the candidate solutions
are generated around a number of points at each iteration pass. Computational results showed that with
the help of this modification the global search ability of the existing VS algorithm is improved and the
MVS algorithm outperformed the existing VS algorithm, PSO2011 and ABC algorithms for the benchmark
numerical function set.
This is the version of my 3D math talk that I used at CocoaConf Atlanta. This version includes the graphic representations of the different steps in implementing the shader.
In machine learning, support vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
Technical presentation of the gesture based NUI I developed for the Aigaio smart conference room in IIT Demokritos
Demo In Greek:
https://www.youtube.com/watch?v=5C_p7MHKA4g
Slides I used in a tutorial on clustering methods with R at an INDUS research network meeting on the 8th October, 2015 (https://sites.google.com/site/indusnetzwerk/events/tagung-berlin). R codes are available at http://rpubs.com/mrkm_a/ClusteringMethodsWithR
Slides of my presentation at WBS 2023 on how to correctly plug AENs into a (V)AR if one is desperate about it. Note that all previous authors applied AENs to a sample, even though they looked at, essentially, a time series.
Main conclusions are
non-linear modelling does not add much value to (V)AR, either in AR part or in reducing dimension of the vector of innovations via AEN.
VAR did not yet work because existing implementations are bad. Should show better results at Quant Minds.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
The Vortex Search (VS) algorithm is one of the recently proposed metaheuristic algorithms which was
inspired from the vortical flow of the stirred fluids. Although the VS algorithm is shown to be a good
candidate for the solution of certain optimization problems, it also has some drawbacks. In the VS
algorithm, candidate solutions are generated around the current best solution by using a Gaussian
distribution at each iteration pass. This provides simplicity to the algorithm but it also leads to some
problems along. Especially, for the functions those have a number of local minimum points, to select a
single point to generate candidate solutions leads the algorithm to being trapped into a local minimum
point. Due to the adaptive step-size adjustment scheme used in the VS algorithm, the locality of the created
candidate solutions is increased at each iteration pass. Therefore, if the algorithm cannot escape a local
point as quickly as possible, it becomes much more difficult for the algorithm to escape from that point in
the latter iterations. In this study, a modified Vortex Search algorithm (MVS) is proposed to overcome
above mentioned drawback of the existing VS algorithm. In the MVS algorithm, the candidate solutions
are generated around a number of points at each iteration pass. Computational results showed that with
the help of this modification the global search ability of the existing VS algorithm is improved and the
MVS algorithm outperformed the existing VS algorithm, PSO2011 and ABC algorithms for the benchmark
numerical function set.
This is the version of my 3D math talk that I used at CocoaConf Atlanta. This version includes the graphic representations of the different steps in implementing the shader.
In machine learning, support vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
Similar to Faster, More Effective Flowgraph-based Malware Classification (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. Ph.D. Candidate at Deakin University.
Research
◦ Malware detection.
◦ Automated vulnerability discovery (check out my
other talk in the main conference).
Did a Masters by research in malware
◦ “Fast automated unpacking and classification of
malware”.
◦ Presented last year at Ruxcon 2010.
This current work extends last year’s work.
3. Traditional AV works well on known samples.
Doesn’t detect unknown samples.
Doesn’t detect “suspiciously similar” samples.
Uses strings as a signature or “birthmark”.
Compares birthmarks by equality.
4. Birthmarks can be program structure.
More static among malware variants.
Birthmarks can be compared using “approximate
similarity”.
Able to detect unknown samples that are
suspiciously similar to known malware.
Vastly reduce number of required signatures.
5. Program p Birthmark MATCH!
Similar?
Program q Birthmark Different
6. Control flow is more invariant among
polymorphic and metamorphic malware.
A directed graph representing control flow.
A control flow graph for every procedure.
One call graph per program.
7. lea 0x4(%esp),%ecx
and $0xfffffff0,%esp Proc_0
pushl -0x4(%ecx)
push %ebp
mov %esp,%ebp
push %ecx
sub $0x24,%esp
call 4011b0 <___main>
movl $0x0,-0x8(%ebp)
jmp 40115f <_main+0x2f>
Proc_1 Proc_3
movl $0x4020a0,(%esp)
call 4011b8 <_puts>
addl $0x1,-0x8(%ebp)
cmpl $0x9,-0x8(%ebp) Proc_4
jle 40114f <_main+0x1f>
add $0x24,%esp
pop %ecx
pop %ebp Proc_2
lea -0x4(%ecx),%esp
ret
8. Known as the “Graph Isomorphism” problem.
Identifies equivalent “structure”.
Not proven to be in NP, but no polynomial
time algorithm known.
9. The number of basic operations applied to a
graph to transform it to another graph.
If you know the distance between two
objects, you know the similarity.
Complexity in NP and infeasible.
11. Input is a string.
Extract all substrings of fixed size Q.
Substrings are known as q-grams.
Let’s take q-grams of all decompiled graphs.
W|IE
|IEH
W|IEH}R
IEH}
EH}R
12. An array <E1,...,En>
A feature vector describes the number of
occurrences of each feature.
En is the number of times feature En occurs.
Let’s make the 500 most common q-grams
as features.
We use feature vectors as birthmarks.
13. A vector is an n-dimensional point.
E.g. 2d vector is <x,y>
Fast.
14. Software similarity problem extended to
similarity search over a database.
Find nearest neighbours (by distance) of a
query.
Or find neighbours within a distance of the
query.
15. Query Benign
r
q
d(p,q)
p
Query Malicious
Query
Malware
16. Vector distances here are “metric”.
It has the mathematical properties of a
metric.
This means you can do a nearest neighbour
search without brute forcing the entire
database!
17. System is 100,000 lines of code of C++.
The modules for this work < 3000 lines of code.
System translates x86 into an intermediate
language (IL).
Performs analysis on architecture independent IL.
Unpacks malware using an application level
emulator.
18. Database of 10,000 malware.
Scanned 1,601 benign binaries.
10 false positives. Less than 1%.
Using additional refinement algorithm,
reduced to 7 false positives.
Very small binaries have small signatures and
cause weak matching.
19. Calculated similarity between Roron malware
variants.
Compared results to Ruxcon 2010 work.
In tables, highlighted cells indicates a positive
match.
The more matches the more effective it is.
20. ao b d e g k m q a ao b d e g k m q a
ao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75
b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87
d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29
e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33
g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30
k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96
m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87
q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87
a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87
Exact Matching Heuristic Approximate
(Ruxcon 2010) Matching (Ruxcon 2010)
ao b d e g k m q a
ao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86
b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97
d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73
e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80
g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77
k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99
m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97
q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97
a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97
Q-Grams
21. Faster than Ruxcon 2010.
Median benign processing time is 0.06s.
Median malware processing time is 0.84s.
Slowest result may be memory thrashing.
% Benign Malware
Samples Time(s) Time(s)
10 0.02 0.16
20 0.02 0.28
30 0.03 0.30
40 0.03 0.36
50 0.06 0.84
60 0.09 0.94
70 0.13 0.97
80 0.25 1.03
90 0.56 1.31
100 8.06 585.16
22. Improved effectiveness and efficiency compared to
Ruxcon 2010.
Runs in real-time in expected case.
Large functional code base and years of development
time.
Happy to talk to vendors.
23. Full academic paper at IEEE Trustcom.
Research page http://www.foocodechu.com
Book on “Software similarity and classification”
available in 2012.
Wiki on software similarity and classification
http://www.foocodechu.com/wiki