Discussion and demonstration of an automated approach
for pairing Memory Forensics with Binary Analysis and
Machine Learning to analyze the execution behavior of
binaries collected from a set of hosts to detect advanced
persistent threats (APT)s that may evade detection by
hooking and "traditional" emulation.
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
Writing a working exploit for a vulnerability is generally challenging, time-consuming, and labor-intensive. To address this issue, automated exploit generation techniques can be adopted. In practice, existing techniques however exhibit an insufficient ability to craft exploits, particularly for the kernel vulnerabilities. On the one hand, this is because their technical approaches explore exploitability only in the context of a crashing process whereas generating an exploit for a kernel vulnerability typically needs to vary the context of a kernel panic. On the other hand, this is due to the fact that the program analysis techniques used for exploit generation are suitable only for simple programs but not the OS kernel which has higher complexity and scalability.
In this talk, we will introduce and release a new exploitation framework to fully automate the exploitation of kernel vulnerabilities. Technically speaking, our framework utilizes a kernel fuzzing technique to diversify the contexts of a kernel panic and then leverages symbolic execution to explore exploitability under different contexts. We demonstrate that this new exploitation framework facilitates exploit crafting from many aspects.
First, it augments a security analyst with the ability to automate the identification of system calls that he needs to take advantages for vulnerability exploitation. Second, it provides security analysts with the ability to achieve security mitigation bypassing. Third, it allows security analysts to automatically generate exploits with different exploitation objectives (e.g., privilege escalation or data leakage). Last but not least, it equips security analysts with an ability to generate exploits even for those kernel vulnerabilities for which the exploitability has not yet been confirmed or verified.
Along with this talk, we will also release many unpublished working exploits against several kernel vulnerabilities. It should be noted that, the vulnerabilities we experimented cover primarily Use-After-Free and heap overflow. Among all these test cases, more than 50% of them do not have working exploits publicly available. To illustrate this release, I have already disclosed one working exploit at my personal website (http://ww9210.cn/). The exploit released on my site pertains to CVE-2017-15649 for which there has not yet been an exploit publicly available with the demonstration of bypassing SMAP.
This presentation talk about some of the challenges in detecting advanced malware which uses evasion techniques such as inline assembly or previously unknown approaches. The presentation also focuses on leveraging the static code analysis as an opportunity to detect these evasive malware in the sandbox
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Priyanka Aash
We, Keen Security Lab of Tencent, have successfully implemented two remote attacks on the Tesla Model S/X in year 2016 and 2017. Last year, at Black Hat USA, we presented the details of our first attack chain. At that time, we showed a demonstration video of our second attack chain, but without technical aspects. This year, we are willing to share our full, in-depth details on this research.
In this presentation, we will explain the inner workings of this technology and showcase the new capability that was developed in the Tesla hacking 2017. Multiple 0-days of different in-vehicle components are included in the new attack chain.
We will also present an in-depth analysis of the critical components in the Tesla car, including the Gateway, BCM(Body Control Modules), and the Autopilot ECUs. For instance, we utilized a code-signing bypass vulnerability to compromise the Gateway ECU; we also reversed and then customized the BCM to play the Model X "Holiday Show" Easter Egg for entertainment.
Finally, we will talk about a remote attack we carried out to successfully gain an unauthorized user access to the Autopilot ECU on the Tesla car by exploiting one more fascinating vulnerability. To the best of our knowledge, this presentation will be the first to demonstrate hacking into an Autopilot module.
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
Writing a working exploit for a vulnerability is generally challenging, time-consuming, and labor-intensive. To address this issue, automated exploit generation techniques can be adopted. In practice, existing techniques however exhibit an insufficient ability to craft exploits, particularly for the kernel vulnerabilities. On the one hand, this is because their technical approaches explore exploitability only in the context of a crashing process whereas generating an exploit for a kernel vulnerability typically needs to vary the context of a kernel panic. On the other hand, this is due to the fact that the program analysis techniques used for exploit generation are suitable only for simple programs but not the OS kernel which has higher complexity and scalability.
In this talk, we will introduce and release a new exploitation framework to fully automate the exploitation of kernel vulnerabilities. Technically speaking, our framework utilizes a kernel fuzzing technique to diversify the contexts of a kernel panic and then leverages symbolic execution to explore exploitability under different contexts. We demonstrate that this new exploitation framework facilitates exploit crafting from many aspects.
First, it augments a security analyst with the ability to automate the identification of system calls that he needs to take advantages for vulnerability exploitation. Second, it provides security analysts with the ability to achieve security mitigation bypassing. Third, it allows security analysts to automatically generate exploits with different exploitation objectives (e.g., privilege escalation or data leakage). Last but not least, it equips security analysts with an ability to generate exploits even for those kernel vulnerabilities for which the exploitability has not yet been confirmed or verified.
Along with this talk, we will also release many unpublished working exploits against several kernel vulnerabilities. It should be noted that, the vulnerabilities we experimented cover primarily Use-After-Free and heap overflow. Among all these test cases, more than 50% of them do not have working exploits publicly available. To illustrate this release, I have already disclosed one working exploit at my personal website (http://ww9210.cn/). The exploit released on my site pertains to CVE-2017-15649 for which there has not yet been an exploit publicly available with the demonstration of bypassing SMAP.
This presentation talk about some of the challenges in detecting advanced malware which uses evasion techniques such as inline assembly or previously unknown approaches. The presentation also focuses on leveraging the static code analysis as an opportunity to detect these evasive malware in the sandbox
Over-the-Air: How we Remotely Compromised the Gateway, BCM, and Autopilot ECU...Priyanka Aash
We, Keen Security Lab of Tencent, have successfully implemented two remote attacks on the Tesla Model S/X in year 2016 and 2017. Last year, at Black Hat USA, we presented the details of our first attack chain. At that time, we showed a demonstration video of our second attack chain, but without technical aspects. This year, we are willing to share our full, in-depth details on this research.
In this presentation, we will explain the inner workings of this technology and showcase the new capability that was developed in the Tesla hacking 2017. Multiple 0-days of different in-vehicle components are included in the new attack chain.
We will also present an in-depth analysis of the critical components in the Tesla car, including the Gateway, BCM(Body Control Modules), and the Autopilot ECUs. For instance, we utilized a code-signing bypass vulnerability to compromise the Gateway ECU; we also reversed and then customized the BCM to play the Model X "Holiday Show" Easter Egg for entertainment.
Finally, we will talk about a remote attack we carried out to successfully gain an unauthorized user access to the Autopilot ECU on the Tesla car by exploiting one more fascinating vulnerability. To the best of our knowledge, this presentation will be the first to demonstrate hacking into an Autopilot module.
Jugal Parikh, Microsoft
Holly Stewart, Microsoft
Humans are susceptible to social engineering. Machines are susceptible to tampering. Machine learning is vulnerable to adversarial attacks. Singular machine learning models can be “gamed” leading to unexpected outcomes.
In this talk, we’ll compare the difficulty of tampering with cloud-based models and client-based models. We then discuss how we developed stacked ensemble models to make our machine learning defenses less susceptible to tampering and significantly improve overall protection for our customers. We talk about the diversity of our base ML models and technical details on how they are optimized to handle different threat scenarios. Lastly, we’ll describe suspected tampering activity we’ve witnessed using protection telemetry from over half a billion computers, and whether our mitigation worked.
2012 B-Sides and ToorCon Talk Offensive Defense
Blog Post - http://blog.ioactive.com/2013/01/offensive-defense.html
Cyber-criminals have had back-end infrastructures equivalent to Virus Total to test if malware and exploits are effective against AV scanners for many years, thus showing that attackers are proactively avoiding detection when building malware. In this day of age malicious binaries are generated on demand by server-side kits when a victim visits a malicious web page, making reliance solely on hash based solutions inadequate. In the last 15 years detection techniques have evolved in an attempt to keep up with attack trends. In the last few years security companies have looked for supplemental solutions such as the use of machine learning to detect and mitigate attacks against cyber criminals. Let's not pretend attackers can't bypass each and every detection technique currently deployed. Join me as I present and review current detection methods found in most host and network security solutions found today. We will re-review the defense in depth strategy while keeping in mind that a solid security strategy consists of forcing an attacker to spend as much time and effort while needing to know a variety of skills and technologies in order to successfully pull off the attack. In the end I hope to convince you that thinking defensively requires thinking offensively.
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesPriyanka Aash
Computer malware in all its forms is nearly as old as the first PCs running commodity OSes, dating back at least 30 years. However, the number and the variety of "computing devices" dramatically increased during the last several years. Therefore, the focus of malware authors and operators slowly but steadily started shifting or expanding towards Internet of Things (IoT) malware.
Unfortunately, at present there is no publicly available comprehensive study and methodology that collects, analyses, measures, and presents the (meta-)data related to IoT malware in a systematic and a holistic manner. In most cases, if not all, the resources on the topic are available as blog posts, sparse technical reports, or Systematization of Knowledge (SoK) papers deeply focused on a particular IoT malware strain (e.g., Mirai). Some other times those resources are already unavailable, or can become unavailable or restricted at any time. Moreover, many of such resources contain errors (e.g., wrong CVEs), omissions (e.g., hashes), limited perspectives (e.g., network behaviour only), or otherwise present incomplete or inaccurate analysis. Hence, all these factors leave unattended the main challenges of analysing, tracking, detecting, and defending against IoT malware in a systematic, effective and efficient way.
This work attempts to bridge this gap. We start with mostly manual collection, archival, meta-information extraction and cross-validation of more than 637 unique resources related to IoT malware families. These resources relate to 60 1 IoT malware families, and include 260 resources related to 48 unique vulnerabilities used in the disclosed or detected IoT malware attacks. We then use the extracted information to establish as accurately as possible the timeline of events related to each IoT malware family and relevant vulnerabilities, and to outline important insights and statistics. For example, our analysis shows that the mean and median CVSS scores of all analyzed vulnerabilities employed by the IoT malware families are quite modest yet: 6.9 and 7.1 for CVSSv2, and 7.5 and 7.5 for CVSSv3 respectively. Moreover, the public knowledge to defend against or prevent those vulnerabilities could have been used, on average, at least 90 days before the first malware samples were submitted for analysis. Finally, to help validate our work as well as to motivate its continuous growth and improvement by the research community, we open-source our datasets and release our IoT malware analysis framework and our IoT malware analysis framework.
Automating Analysis and Exploitation of Embedded Device FirmwareMalachi Jones
Dynamic binary analysis tools utilize a combination of techniques that include fuzzing, symbolic execution, and concolic execution to discover exploitable code in sophisticated binaries. Much work has been dedicated to developing automated analysis tools to target mainstream processor architectures (e.g. x86 and x86_64. ). An often overlooked and inadequately addressed area is the development of tools that target embedded systems processors that include PowerPC, MIPS, and SuperH. Historically, a challenge with targeting multiple embedded architectures was that it was often necessary to write an analysis tool for each architecture.
In this talk, we'll discuss an approach for decoupling the architecture specifics from the analysis by utilizing intermediate representation (IR) languages. Intermediate representation languages provide a method to abstract out machine specifics in order to aid in the analysis of computer programs. In particular, the LLVM IR language provides an extensive set of analysis and optimization libraries, along with a JIT engine, that can be collectively utilized to develop architecture-independent automated analysis and exploitation tools.
Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...EndgameInc
Despite the best efforts of the security community—and big claims from security vendors—large areas of vulnerabilities and exploits remain to be leveraged by adversaries.You will learn about:
- A new perspective on the current state of software flaws.
- The wide margin between disclosed vulnerabilities and
public exploits including a historical analysis and
trending patterns.
- Effective countermeasures that can be deployed to
detect, and prevent, the exploitation of vulnerabilities.
- The limitations of Operating System provided mitigations,
and how a combination of increased countermeasures
with behavioral analysis will get defenders closer to
preventing the largest number of threats.
Security from both sides of the fence – a discussion of techniques, such as fuzzing, to reduce the likelihood of an attacker
discovering exploits on smartphones and PCs;
plus a demonstration of approaches hackers may use to weaponize and exploit vulnerabilities.
This is my presentation in JWC 4th Event Computer and Network Security FOrum at Binus International University. I talk about how to setup your own malware lab for malware analysis purpose.
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
Over the last few years, as the world has moved closer to realizing the idea of the Internet of Things, an increasing number of the analog things with which we used to interact every day have been replaced with connected devices. The increasingly-complex systems that drive these devices have one thing in common – they must all communicate to carry out their intended functionality. Such communication is handled by firmware embedded in the device. And firmware, like any piece of software, is susceptible to a wide range of errors and vulnerabilities.
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
Detection Challenges
Machine Learning Approaches
Modeling Machine Learning classifiers
Attacks on Machine Learning Defenses
Real Protect
Deep Learning in Sandbox
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex
В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
Adversarial machine learning for av softwarejunseok seo
Introduce practical guidances for developing adversarial machine model for anti-malware software. I didn't use reinforcement model yet, just proof-of-concept. If you have any questions about my work, email to me :)
nababora@naver.com
Malware Detection Using Machine Learning TechniquesArshadRaja786
Malware viruses can be easily detected using machine learning Techniques such as K-Mean Algorithms, KNN algorithm, Boosted J48 Decision Tree and other Data Mining Techniques. Among them J48 proved to be more effective in detecting computer virus and upcoming networks worms...
Jugal Parikh, Microsoft
Holly Stewart, Microsoft
Humans are susceptible to social engineering. Machines are susceptible to tampering. Machine learning is vulnerable to adversarial attacks. Singular machine learning models can be “gamed” leading to unexpected outcomes.
In this talk, we’ll compare the difficulty of tampering with cloud-based models and client-based models. We then discuss how we developed stacked ensemble models to make our machine learning defenses less susceptible to tampering and significantly improve overall protection for our customers. We talk about the diversity of our base ML models and technical details on how they are optimized to handle different threat scenarios. Lastly, we’ll describe suspected tampering activity we’ve witnessed using protection telemetry from over half a billion computers, and whether our mitigation worked.
2012 B-Sides and ToorCon Talk Offensive Defense
Blog Post - http://blog.ioactive.com/2013/01/offensive-defense.html
Cyber-criminals have had back-end infrastructures equivalent to Virus Total to test if malware and exploits are effective against AV scanners for many years, thus showing that attackers are proactively avoiding detection when building malware. In this day of age malicious binaries are generated on demand by server-side kits when a victim visits a malicious web page, making reliance solely on hash based solutions inadequate. In the last 15 years detection techniques have evolved in an attempt to keep up with attack trends. In the last few years security companies have looked for supplemental solutions such as the use of machine learning to detect and mitigate attacks against cyber criminals. Let's not pretend attackers can't bypass each and every detection technique currently deployed. Join me as I present and review current detection methods found in most host and network security solutions found today. We will re-review the defense in depth strategy while keeping in mind that a solid security strategy consists of forcing an attacker to spend as much time and effort while needing to know a variety of skills and technologies in order to successfully pull off the attack. In the end I hope to convince you that thinking defensively requires thinking offensively.
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesPriyanka Aash
Computer malware in all its forms is nearly as old as the first PCs running commodity OSes, dating back at least 30 years. However, the number and the variety of "computing devices" dramatically increased during the last several years. Therefore, the focus of malware authors and operators slowly but steadily started shifting or expanding towards Internet of Things (IoT) malware.
Unfortunately, at present there is no publicly available comprehensive study and methodology that collects, analyses, measures, and presents the (meta-)data related to IoT malware in a systematic and a holistic manner. In most cases, if not all, the resources on the topic are available as blog posts, sparse technical reports, or Systematization of Knowledge (SoK) papers deeply focused on a particular IoT malware strain (e.g., Mirai). Some other times those resources are already unavailable, or can become unavailable or restricted at any time. Moreover, many of such resources contain errors (e.g., wrong CVEs), omissions (e.g., hashes), limited perspectives (e.g., network behaviour only), or otherwise present incomplete or inaccurate analysis. Hence, all these factors leave unattended the main challenges of analysing, tracking, detecting, and defending against IoT malware in a systematic, effective and efficient way.
This work attempts to bridge this gap. We start with mostly manual collection, archival, meta-information extraction and cross-validation of more than 637 unique resources related to IoT malware families. These resources relate to 60 1 IoT malware families, and include 260 resources related to 48 unique vulnerabilities used in the disclosed or detected IoT malware attacks. We then use the extracted information to establish as accurately as possible the timeline of events related to each IoT malware family and relevant vulnerabilities, and to outline important insights and statistics. For example, our analysis shows that the mean and median CVSS scores of all analyzed vulnerabilities employed by the IoT malware families are quite modest yet: 6.9 and 7.1 for CVSSv2, and 7.5 and 7.5 for CVSSv3 respectively. Moreover, the public knowledge to defend against or prevent those vulnerabilities could have been used, on average, at least 90 days before the first malware samples were submitted for analysis. Finally, to help validate our work as well as to motivate its continuous growth and improvement by the research community, we open-source our datasets and release our IoT malware analysis framework and our IoT malware analysis framework.
Automating Analysis and Exploitation of Embedded Device FirmwareMalachi Jones
Dynamic binary analysis tools utilize a combination of techniques that include fuzzing, symbolic execution, and concolic execution to discover exploitable code in sophisticated binaries. Much work has been dedicated to developing automated analysis tools to target mainstream processor architectures (e.g. x86 and x86_64. ). An often overlooked and inadequately addressed area is the development of tools that target embedded systems processors that include PowerPC, MIPS, and SuperH. Historically, a challenge with targeting multiple embedded architectures was that it was often necessary to write an analysis tool for each architecture.
In this talk, we'll discuss an approach for decoupling the architecture specifics from the analysis by utilizing intermediate representation (IR) languages. Intermediate representation languages provide a method to abstract out machine specifics in order to aid in the analysis of computer programs. In particular, the LLVM IR language provides an extensive set of analysis and optimization libraries, along with a JIT engine, that can be collectively utilized to develop architecture-independent automated analysis and exploitation tools.
Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...EndgameInc
Despite the best efforts of the security community—and big claims from security vendors—large areas of vulnerabilities and exploits remain to be leveraged by adversaries.You will learn about:
- A new perspective on the current state of software flaws.
- The wide margin between disclosed vulnerabilities and
public exploits including a historical analysis and
trending patterns.
- Effective countermeasures that can be deployed to
detect, and prevent, the exploitation of vulnerabilities.
- The limitations of Operating System provided mitigations,
and how a combination of increased countermeasures
with behavioral analysis will get defenders closer to
preventing the largest number of threats.
Security from both sides of the fence – a discussion of techniques, such as fuzzing, to reduce the likelihood of an attacker
discovering exploits on smartphones and PCs;
plus a demonstration of approaches hackers may use to weaponize and exploit vulnerabilities.
This is my presentation in JWC 4th Event Computer and Network Security FOrum at Binus International University. I talk about how to setup your own malware lab for malware analysis purpose.
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
Over the last few years, as the world has moved closer to realizing the idea of the Internet of Things, an increasing number of the analog things with which we used to interact every day have been replaced with connected devices. The increasingly-complex systems that drive these devices have one thing in common – they must all communicate to carry out their intended functionality. Such communication is handled by firmware embedded in the device. And firmware, like any piece of software, is susceptible to a wide range of errors and vulnerabilities.
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
Detection Challenges
Machine Learning Approaches
Modeling Machine Learning classifiers
Attacks on Machine Learning Defenses
Real Protect
Deep Learning in Sandbox
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex
В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
Adversarial machine learning for av softwarejunseok seo
Introduce practical guidances for developing adversarial machine model for anti-malware software. I didn't use reinforcement model yet, just proof-of-concept. If you have any questions about my work, email to me :)
nababora@naver.com
Malware Detection Using Machine Learning TechniquesArshadRaja786
Malware viruses can be easily detected using machine learning Techniques such as K-Mean Algorithms, KNN algorithm, Boosted J48 Decision Tree and other Data Mining Techniques. Among them J48 proved to be more effective in detecting computer virus and upcoming networks worms...
Checkmate to crypto malware. Scacco matto ai crypto malwareGianfranco Tonello
How defeat the crypto-malware as CryptoLocker, CryptoWall, CTBLocker, TeslaCrypt and CryptoLocky. In this presentation we shows as VirIT can block the process of crypto-malware, while the malware is encrypting the file of documents and we can save the files that remain. You can see the video of youtube: https://youtu.be/_SyKqqZu6-8
AI approach to malware similarity analysis: Maping the malware genome with a...Priyanka Aash
In recent years, cyber defenders protecting enterprise networks have started incorporating malware code sharing identification tools into their workflows. These tools compare new malware samples to a large databases of known malware samples, in order to identify samples with shared code relationships. When unknown malware binaries are found to share code "fingerprints" with malware from known adversaries, they provides a key clue into which adversary is generating these new binaries, thus helping develop a general mitigation strategy against that family of threats. The efficacy of code sharing identification systems is demonstrated every day, as new family of threats are discovered, and countermeasures are rapidly developed for them. Unfortunately, these systems are hard to maintain, deploy, and adapt to evolving threats. First and foremost, these systems do not learn to adapt to new malware obfuscation strategies, meaning they will continuously fall out of date with adversary tradecraft, requiring, periodically, a manually intensive tuning in order to adjust the formulae used for similarity between malware. In addition, these systems require an up to date, well maintained database of recent threats in order to provide relevant results. Such a database is difficult to deploy, and hard and expensive to maintain for smaller organizations. In order to address these issues we developed a new malware similarity detection approach. This approach, not only significantly reduces the need for manual tuning of the similarity formulate, but also allows for significantly smaller deployment footprint and provides significant increase in accuracy. Our family/similarity detection system is the first to use deep neural networks for code sharing identification, automatically learning to see through adversary tradecraft, thereby staying up to date with adversary evolution. Using traditional string similarity features our approach increased accuracy by 10%, from 65% to 75%. Using an advanced set of features that we specifically designed for malware classification, our approach has 98% accuracy. In this presentation we describe how our method works, why it is able to significantly improve upon current approaches, and how this approach can be easily adapted and tuned to individual/organization needs of the attendees.
(Source: Black Hat USA 2016, Las Vegas)
Battling Unknown Malware with Machine Learning CrowdStrike
Learn about the first signature-less engine to be integrated into VirusTotal. In this CrowdCast deck, CrowdStrike’s Chief Scientist Dr. Sven Krasser offers an exclusive look “under the hood” of this unique machine learning engine, revealing how it works, how it differs from all other signature-based engines integrated into VirusTotal to date, and how it fits into the larger ecosystem of techniques used by CrowdStrike Falcon to keep endpoints and environments safe.
Topics will include:
- What CrowdStrike Falcon machine learning is and how it works
- How to interpret results of machine learning-based threat detection
- How users can benefit from the CrowdStrike Falcon machine learning engine
- How this cutting-edge technology fits into the CrowdStrike Falcon breach prevention platform
Machine Learning for Malware Classification and ClusteringEndgameInc
In this talk, we will give an overview of the machine learning model that is the foundation of Endgame’s automated malware classifier. We will discuss challenges and best approaches to finding a metric that adequately summarizes a model's performance recognizing malware and we will show how model results inform the more tactical analysis of malware researchers.
Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf
A Machine Learning approach for detecting a Malware:
The project is to improve the way we detect script based malware using Machine Learning. Malware has become one of the most active channel to deliver threats like Banking Trojans and Ransomware. The talk is aimed at finding a new and effective way to detect the malware. We started with acquiring both malicious and clean samples. Later we performed feature identification, while building on top of existing knowledge base of malware. Then we performed automated feature extraction. After certain feature set is obtained, we teased-out feature which are categorical, interdependent or composite. We applied varying machine learning models, producing both binary and categorical outcomes. We cross validated our results and re-tuned our feature set and our model, until we obtained satisfying results, with least false-positives. We concluded that not all the extracted features are significant, in fact some features are detrimental on the model performance. Once such features are factored-out, it results not only in better match, but also provides a significant gain in performance.
A school project. How well does Metasploit encode its executables? I'm putting it to the test against 10 different antivirus programs. This is just the proposal; the finished version should be done in a month or so.
The Hacking Games - Operation System Vulnerabilities Meetup 29112022lior mazor
Our technology, work processes, and activities all are depend based on Operation Systems to be safe and secure. Join us virtually for our upcoming "The Hacking Games - Operation System Vulnerabilities" Meetup to learn how hacker can compromise Operation System, bypass AntiVirus protection layer and exploiting Linux eBPF.
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...Maksim Shudrak
Dynamic binary instrumentation (DBI) is a technique for analysing the behaviour of a binary application at runtime through the injection of instrumentation code. This instrumentation code is designed to be transparent towards the instrumented application and it executes as a part of the normal execution flow without significant runtime overhead. Moreover, there are no limitations for the instrumentation code - a user can implement even a complex logic to observe execution flow, memory layout, etc. Certainly, such a flexible and powerful technique can and should be used for malware analysis. However, while there are several open-source tools (PoCs) implemented on top of DBI frameworks, their application for malware analysis is very limited.
In the talk the author will discuss the pros and cons of malicious code instrumentation and his experience of how DBI can be used to perform investigation of sophisticated banking trojans such as Gootkit and EmbusteBot as well as dozens of other malicious samples in practice.
Moreover, the author will release a new tool for transparent and lightweight dynamic malware analysis and will demonstrate, using examples, how this tool can help researchers to easily reveal important behaviour details of sophisticated malicious samples. EmbusteBot (a new banking trojan from Brazil found and reported by the author in 2017) was investigated using only this tool without even starting a debugger or disassembler.
On August 2017 a well established Corporation was hit by an advanced attacker. The techniques adopted to overcome security platforms and infrastructures showed a very dangerous and innovative attacker. This is the tale of the IR team hired to fight this advanced attacker, a tale of a team pushing all his resources and technical skills to overcome the threat and finally chase the Adder...
The .NET Garbage Collector (GC) is really cool. It helps providing our applications with virtually unlimited memory, so we can focus on writing code instead of manually freeing up memory. But how does .NET manage that memory? What are hidden allocations? Are strings evil? It still matters to understand when and where memory is allocated. In this talk, we’ll go over the base concepts of .NET memory management and explore how .NET helps us and how we can help .NET – making our apps better. Expect profiling, Intermediate Language (IL), ClrMD and more!
Watchtowers of the Internet - Source Boston 2012Stephan Chenette
Watchtowers of the Internet: Analysis of Outbound Malware Communication, Stephan Chenette, Principal Security Researcher, (@StephanChenette) & Armin Buescher, Security Researcher
With advanced malware, targeted attacks, and advanced persistent threats, it’s not IF but WHEN a persistant attacker will penetrate your network and install malware on your company’s network and desktop computers. To get the full picture of the threat landscape created by malware, our malware sandbox lab runs over 30,000 malware samples a day. Network traffic is subsequently analyzed using heuristics and machine learning techniques to statistically score any outbound communication and identify command & control, back-channel, worm-like and other types of traffic used by malware.
Our talk will focus on the setup of the lab, major malware families as well as outlier malware, and the statistics we have generated to give our audience an exposure like never before into the details of malicious outbound communication. We will provide several tips, based on our analysis to help you create a safer and more secure network.
Stephan Chenette is a principal security researcher at Websense Security Labs, specializing in research tools and next generation emerging threats. In this role, he identifies and implements exploit and malcode detection techniques.
Armin Buescher is a Security Researcher and Software Engineer experienced in strategic development of detection/prevention technologies and analysis tools. Graduated as Dipl.-Inf. (MSc) with thesis on Client Honeypot systems. Interested in academic research work and published author of security research papers.
We examine various techniques for the static and dynamic analysis of malware, highlighting potential limitations and pitfalls in the analytic process. For more information about malware detection and removal visit https://www.intertel.co.za
Reversing & Malware Analysis Training Part 9 - Advanced Malware Analysissecurityxploded
This presentation is part of our Reverse Engineering & Malware Analysis Training program.
For more details refer our Security Training page
http://securityxploded.com/security-training.php
EMBA - Firmware analysis - Black Hat Arsenal USA 2022MichaelM85042
IoT (Internet of Things) and OT (Operational Technology) are the current buzzwords for networked devices on which our modern society is based on. In this area, the used operating systems are summarized with the term firmware. The devices themselves, also called embedded devices, are essential in the private and industrial environments as well as in the so-called critical infrastructure.
Penetration testing of these systems is quite complex as we have to deal with different architectures, optimized operating systems and special protocols. EMBA is an open-source firmware analyzer with the goal to simplify and optimize the complex task of firmware security analysis. EMBA supports the penetration tester with the automated detection of 1-day vulnerabilities on binary level. This goes far beyond the plain CVE detection: With EMBA you always know which public exploits are available for the target firmware. Besides the detection of already known vulnerabilities, EMBA also supports the tester on the next 0-day. For this, EMBA identifies critical binary functions, protection mechanisms and services with network behavior on a binary level. There are many other features built into EMBA, such as fully automated firmware extraction, finding file system vulnerabilities, hard-coded credentials, and more.
EMBA is the open-source firmware scanner, created by penetration testers for penetration testers.
Project page: https://github.com/e-m-b-a/emba
Conference page: https://www.blackhat.com/us-22/arsenal/schedule/index.html#emba--open-source-firmware-security-testing-26596
Similar to Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machine Learning (20)
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
2. ABOUT ME
Education
• Bachelors Degree: Computer Engineering (Univ. of Florida, 2007)
• Master’s Degree: Computer Engineering (Georgia Tech, 2009)
• PhD: Computer Engineering (Georgia Tech, 2013)
Cyber Security Experience
• PhD Thesis: Asymmetric Information Games & Cyber Security (2009-2013)
• Harris Corp.: Cyber Software Engineer/ Vuln. Researcher (2013-2015)
• Booz Allen Dark Labs: Embedded Security Researcher (2016- Present)
https://www.linkedin.com/in/malachijonesphd
3. ABOUT DARK LABS
Booz Allen Dark Labs is an elite team of security
researchers, penetration testers, reverse engineers,
network analysts, and data scientists, dedicated to
stopping cyber attacks before they occur.1
(1 http://darklabs.bah.com)
4. OUTLINE
i. Efficiently Detecting Dissimilarities in Memory Artifacts
ii. Clustering Memory Artifacts at Scale to Detect Anomalies
dkdkdkd
i. Automating Dynamic Call Trace Generation
ii. Leveraging Call Trace Information to Detect APTs
Motivation
Phase I Detection: Static Analysis
Memory Acquisition Automation
Conclusion/ Q&A
Phase II Detection: Dynamic Analysis
Appendix
6. MOTIVATION
Code injection is a technique utilized by malware to hide malicious code
in a legitimate process and/or library and to force a legitimate process to
perform the execution on the malware’s behalf
Target Process
(Pre-Injection)
PE Injection
Target Process
(Post-Injection)
Free Space Malware
Image Image
Process A
(Pre-Hijack)
DLL Hijacking
Process A
(Post-Hijack)
helper1.dll
Image Image
helper2.dll
helper1.dll
helper2.dll
Malicious dll loaded
instead of legitimate dll
CreateRemoteThread() API call can be
used to spawn a child thread of target
process to execute malicious code
In addition to PE Injection and DLL Hijacking (shown above), other
methods include process hollowing and reflective DLL Injection
7. MOTIVATION
Emulation and Hooking are modern techniques that are employed by anti-virus
(AV) vendors (shown above) to monitor the execution behavior of binaries
executing on a target host
These techniques combined with Execution Behavior Analysis can allow for the
discovery of Advance Persistent Threats (APT)s that leverage advanced code
injection techniques to hide in memory and disguise execution
8. MOTIVATION
Problem: Hooking and “traditional” emulation
techniques can be reliably evaded by APTs
Examples:
• Hooking - Malware can either use lower level unhooked
APIs or remove hooks at runtime
• Emulation- Utilize an incorrectly implemented API
emulation function (e.g. undocumented Windows API)
and detect unexpected output given a specified input
9. MOTIVATION
Observation: A necessary condition for malicious code to be executed is
that the code must reside in memory prior to and during execution
As a consequence, periodic live collection and analysis of memory artifacts
can provide an effective means to identify malware residing on a host
10. OBJECTIVE
Demonstrate an approach to compare the following memory artifacts across a
set of networked hosts in a scalable manner to identify anomalous code:
i. Processes
ii. Shared Libraries
iii. Kernel Modules
iv. Drivers
Specifically, we will discuss how an approximate clustering algorithm with linear
run-time performance can be leveraged to identify outliers among a set of
equivalent types of artifacts (e.g. explorer.exe processes) collected from each
networked host
We will also discuss how Dynamic Binary Analysis can be utilized to improve
detection of sophisticated malware threats
11. MAIN TAKEAWAYS
Phase I : We’ll leverage static analysis to provide us with a computationally
efficient way to rapidly identify memory artifacts with anomalous code
Phase II: Dynamic Analysis will be utilized to differentiate between benign
anomalous code and malware
Static
Analysis
Phase I
Automated Malware Detection Process
Dynamic analysis is leveraged to reduce false positives (from Phase I) and to more accurately identify APTs
Cluster
Anomaly
Detection
Dynamic
Analysis
APT
Detection
Phase II
12. MAIN TAKEAWAYS
Set of Networked Hosts
Host A Host B Host C Host D Host E Host F
Memory Artifacts Sent to Collection Server
Host A Host B Host C Host D Host E Host F
Identical types of artifacts are clustered
Host D Host B
An example type of artifact is an explorer.exe process
Host G
Host G
Host AHost F Host CHost EHost G
Outliers (e.g. Host F) are analyzed dynamically to more accurately identify malicious behavior
An agent can be installed on each host to periodically send memory artifacts to a collection server
13. MAIN TAKEAWAYS
Set of Networked Hosts
Host A Host B Host C Host D Host E Host F
Memory Artifacts Sent to Collection Server
Host A Host B Host C Host D Host E Host F
Identical types of artifacts are clustered
Host D Host B
An example type of artifact is an explorer.exe process
Host G
Host G
Host AHost F Host CHost EHost G
Outliers (e.g. Host F) are analyzed dynamically to more accurately identify malicious behavior
An agent can be installed on each host to periodically send memory artifacts to a collection server
14. MAIN TAKEAWAYS
Set of Networked Hosts
Host A Host B Host C Host D Host E Host F
Memory Artifacts Sent to Collection Server
Host A Host B Host C Host D Host E Host F
Identical types of artifacts are clustered
Host D Host B
An example type of artifact is an explorer.exe process
Host G
Host G
Host AHost F Host CHost EHost G
Outliers (e.g. Host F) are analyzed dynamically to more accurately identify malicious behavior
An agent can be installed on each host to periodically send memory artifacts to a collection server
15. MAIN TAKEAWAYS
Set of Networked Hosts
Host A Host B Host C Host D Host E Host F
Memory Artifacts Sent to Collection Server
Host A Host B Host C Host D Host E Host F
Identical types of artifacts are clustered
Host D Host B
An example type of artifact is an explorer.exe process
Host G
Host G
Host AHost F Host CHost EHost G
Outliers (e.g. Host F) are analyzed dynamically to more accurately identify malicious behavior
An agent can be installed on each host to periodically send memory artifacts to a collection server
17. MEMORY ACQUISITION AUTOMATION
Rekall is an open source tool that can be used to acquire live memory
and perform analysis
We can develop an agent that is deployed on each host that interfaces
with Rekall to collect desired memory artifacts in an automated fashion
Set of Networked Hosts
Memory Artifacts Sent to Collection Server
Host A Host B Host C
An example of process artifacts acquired from physical memory of each host
18. MEMORY ACQUISITION AUTOMATION
Querying for active and terminated processes
pslist command allows for live querying of internal data
structures in memory for active and terminated processes
19. MEMORY ACQUISITION AUTOMATION
Querying for active and terminated processes
pslist command allows for live querying of internal data
structures in memory for active and terminated processes
Terminated process
(notepad++) still in memory
26 days later
20. MEMORY ACQUISITION AUTOMATION
Capturing live dumps of binaries w/ Rekall
procdump dumps a set of specified processes (given pids
as input) to a desired directory
21. MEMORY ACQUISITION AUTOMATION
Capturing live dumps of binaries w/ Rekall
procdump dumps a set of specified processes (given pids
as input) to a desired directory
28. PHASE I DETECTION: STATIC ANALYSIS
Overall Goals:
• Group memory artifacts based on their similarity (as demonstrated
in the above figure) to identify outliers
• Leverage Dynamic Analysis (Phase II) to differentiate between benign
anomalous code and malware
Memory Artifacts Sent to Collection Server
Host A Host B Host C Host D Host E Host F
Identical types of artifacts are clustered
Host D Host B
Host G
Host AHost F Host CHost EHost G
29. PHASE I DETECTION: STATIC ANALYSIS
Requirements for clustering artifacts at scale
• Computationally efficient method for determining the
similarity (or dissimilarity) of a pair of binaries
• Clustering algorithm with a linear run-time performance
in the worst-case
Identical types of artifacts are clustered
Host D Host B Host AHost F Host CHost EHost G
30. PHASE I DETECTION: STATIC ANALYSIS
Computationally Efficient Diffing Algorithm
31. EFFICIENT DIFFING ALGORITHM
Question: Why is efficient binary diffing critical to
our goal of detecting malicious code?
• Slow diffing Slow clustering Delayed threat detection
• We want to be able to cluster a large set of binaries (10,000+)
pretty quickly to identify binaries that pose potential threats
to hosts on the network
• Clustering algorithms need to perform a large number of
diffing operations with respect to the binaries (exact number
depends on run-time algorithm performance)
32. EFFICIENT DIFFING ALGORITHM
main (explorer.exe)
push ebp
mov rbp, rsp
sub rsp, 0x10
Step 1: Generate a vector representation of each binary
Step 2: Compute the similarity of a pair of vectors
explorerA.exe explorerB.exe
Similarity
Function
explorerA.exe
.867
Similarity function takes as input two vectors and produces a value between 0 and 1
Each row of the vector represents the number of occurrences of a unique sequence of instructions
explorerB.exe
(See Appendix D for more details about the algorithm)
33. EFFICIENT DIFFING ALGORITHM
Evaluating similarity function of on-disk copy of
explorer.exe vs. in-memory image
On-Disk
Copy
In-Memory
Copy
36. EFFICIENT DIFFING ALGORITHM
Comparing similarity results against BinDiff
BinDiff:
84% similar
Jaccard Index:
86.7% similar
37. EFFICIENT DIFFING ALGORITHM
Performance vs BinDiff
• BinDiff Analysis : 4.2 seconds
• Jaccard Index Analysis: 106 ms (in pure python)
• Speed up factor of 42
106 ms
39. CLUSTERING ARTIFACTS AT SCALE
Clustering Algorithms
• We’ll utilize an agglomerative hierarchical clustering algorithm
because we don’t have to specify the exact number of clusters, k, a
priori (vs. k-means, where k must be specified)
• Computational complexity for a non-approximated implementation
of the algorithm is , which is not a desirable property for
achieving scalability
• Instead, we’ll use an approximate implementation (presented in [1]),
which has a linear worse-case complexity of
• Note: k is constant and k << n
40. Approximate Clustering Algorithm [1] Sketch
CLUSTERING ARTIFACTS AT SCALE
Set of memory artifacts
Step 1: Prototype Extraction [O(k·n) ]
Step 2: Clustering w/ Prototype [O(k2 log (k) + n) ]
Host A Host B Host C Host D Host E Host F Host G
Host D
Host B
Host F
Host G
Host B Host FHost A
Host A
Host CHost E
Prototypes are a small (k << n), yet representative subset of artifacts
42. PHASE II: DYNAMIC ANALYSIS
Although dynamic analysis can be more accurate (i.e. fewer false-positives)
than static analysis, it can also be more computationally expensive
Therefore, we’ve discussed utilizing static analysis (Phase I) to filter out
binaries based on their similarity.
Consequently, we can focus computationally efforts in Phase II on
differentiating benign anomalous code from APTs via dynamic analysis
Static
Analysis
Phase I
Automated Malware Detection Process
Cluster
Anomaly
Detection
Dynamic
Analysis
APT
Detection
Phase II
43. PHASE II: DYNAMIC ANALYSIS
Key Concept: Binaries utilize system calls and library function calls
(e.g. dlls) to interact with the operating system in order to perform
meaningful/desired operations
Corollary: By analyzing the external call sequences of a binary (e.g.
call trace), the underlying behavior and intent of the binary can be
characterized
46. PHASE II: DYNAMIC ANALYSIS
Case Study: “Sample J” Malware
memset(,0, 0x124u)
CreateToolhelp32Snapshot()
Process32First()
stricmp(., “explorer.exe”)
Process32Next(.,.)
stricmp(., “explorer.exe”)
CreateThread(,,StartAddress)
Sample J Call Trace Example
Call Trace Analysis
i. Iterate through the process handles to see if a process with
name “explorer.exe” exists (Check if user logged in)
ii. If process exists, create a thread that infects target system
47. PHASE II: DYNAMIC ANALYSIS
Questions:
1. How do we automate the call trace generation process in
a manner that maximizes traversal of unique code paths?
2. How do we leverage generated call trace information to
identify potential APTs?
49. AUTOMATING CALL TRACE GENERATION
Call Trace Generation
i. Load the dumped binary into a disassembler (e.g. IDA or Binary
Ninja) and extract the executable portion of binary
ii. Lift the executable portion of binary to vex, a RISC-like
intermediate representation (ir) language
iii. Perform emulation on the vex ir to traverse unique code paths
that originate from a specified entry function
iv. Record calls that occur during traversal of a code path
Disassembler vex ir ir emulation Call Traces
50. AUTOMATING CALL TRACE GENERATION
main()
push ebp
mov rbp, rsp
call foo()
t0 = GET:I64(bp)
t3 = GET:I64(rsp)
t2 = Sub64(t3, 0x008)
PUT(rsp) = t2
Stle(t1) = t0
t3 = GET:I64(rsp)
PUT(bp) = t3
call libfuncfoo()
Disassembler vex ir
main() [vex ir]
emulation
t0 = GET:I64(bp)
t3 = GET:I64(rsp)
t2 = Sub64(t3, 0x008)
PUT(rsp) = t2
Stle(t1) = t0
t3 = GET:I64(rsp)
PUT(bp) = t3
call libfuncfoo()
vex ir
call libfuncfoo()
Instruction
pointer
56. LEVERAGE CALL TRACE INFO TO DETECT APTS
Call Trace 1
…
…
Step 1: Generate a vector representation of the set of call traces
Traces generated via dynamic analysis, where different code paths are traversed to maximize code coverage
Call Trace 2 Call Trace n
……
…
…
…
Step 3: Compare target binary against unified trusted call trace vector
Function output is percentage of call sequences in target that are trusted (1 all sequences trusted)
Target vector Trusted Vector
Comparison
Function
Target vector
Trusted vector
.6
Step 2: Build a unified trusted call trace vector from trusted binaries
+ + + + +
Trusted
Call Trace
vectors
Unified
call trace
vector
57. Call Trace 1
…
…
Step 1: Generate a vector representation of the set of call traces
Traces generated via dynamic analysis, where different code paths are traversed to maximize code coverage
Call Trace 2 Call Trace n
……
…
…
…
Step 2: Build a unified trusted call trace vector from trusted binaries
+
Step 3: Compare target binary against unified trusted call trace vector
Function output is percentage of call sequences in target that are trusted (1 all sequences trusted)
Target vector Trusted Vector
Comparison
Function
Target vector
Trusted vector
.6
+ + + +
Trusted
Call Trace
vectors
Unified
call trace
vector
LEVERAGE CALL TRACE INFO TO DETECT APTS
58. Call Trace 1
…
…
Step 1: Generate a vector representation of the set of call traces
Traces generated via dynamic analysis, where different code paths are traversed to maximize code coverage
Call Trace 2 Call Trace n
……
…
…
…
Step 3: Compare target binary against unified trusted call trace vector
Function output is percentage of call sequences in target that are trusted (1 all sequences trusted)
Target vector Trusted Vector
Comparison
Function
Target vector
Trusted vector
.6
LEVERAGE CALL TRACE INFO TO DETECT APTS
Step 2: Build a unified trusted call trace vector from trusted binaries
+ + + + +
Trusted
Call Trace
vectors
Unified
call trace
vector
59. Call Trace 1
…
…
Step 1: Generate a vector representation of the set of call traces
Traces generated via dynamic analysis, where different code paths are traversed to maximize code coverage
Call Trace 2 Call Trace n
……
…
…
…
Step 3: Compare target binary against unified trusted call trace vector
Function output is percentage of call sequences in target that are trusted (1 all sequences trusted)
Target vector Trusted Vector
Comparison
Function
Target vector
Trusted vector
.6
(See Appendix E for more details about the algorithm)
LEVERAGE CALL TRACE INFO TO DETECT APTS
Step 2: Build a unified trusted call trace vector from trusted binaries
+ + + + +
Trusted
Call Trace
vectors
Unified
call trace
vector
61. CONCLUSION
Phase I : Provide us with a computationally efficient way to rapidly
identify memory artifacts with anomalous code
Phase II: We leverage call trace information to differentiate between
benign anomalous code and malware
Static
Analysis
Phase I
Automated Malware Detection Process
Dynamic analysis is leveraged to reduce false positives (from Phase I) and to more accurately identify APTs
Cluster
Anomaly
Detection
Dynamic
Analysis
APT
Detection
Phase II
62. CONCLUSION
Security is hard… Why not make it even harder for the
adversary?
Specifically, require the adversary to develop techniques
to challenge this memory forensics approach that are
both reliable (e.g. few bugs) and portable (e.g. works
across various versions of the OS and binaries)
63. REFERENCES
1. Rieck, K., Trinius, P., Willems, C., & Holz, T. (2011). Automatic analysis of
malware behavior using machine learning. Journal of Computer Security, 19(4),
639-668.
2. Stuttgen, Johannes & Cohen, Michael (2013). Anti-forensic Resilient Memory
Acquisition. Digit. Investig., 10, S105-S115..
3. Shoshitaishvili, Y., Wang, R., Hauser, C., Kruegel, C., & Vigna, G. (2015,
February). Firmalice-Automatic Detection of Authentication Bypass
Vulnerabilities in Binary Firmware. In NDSS.
4. Koret, Joxean & Bachaalany, Elias (2015). The Antivirus Hacker's Handbook.
Wiley Publishing
5. Carter, K. M., Lippmann, R. P., & Boyer, S. W. (2010, November). Temporally
oblivious anomaly detection on large networks using functional peers. In
Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
(pp. 465-471). ACM.
6. Mosli, R., Li, R., Yuan, B., & Pan, Y. (2016, May). Automated malware detection
using artifacts in forensic memory images. In Technologies for Homeland
Security (HST), 2016 IEEE Symposium on (pp. 1-6). IEEE.
64. REFERENCES
6. Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J., ... &
Vigna, G. (2016, February). Driller: Augmenting Fuzzing Through Selective
Symbolic Execution. In NDSS (Vol. 16, pp. 1-16).
7. Liang, S. C. (2016). Understanding behavioural detection of antivirus.
8. Anderson, B., Storlie, C., & Lane, T. (2012, October). Improving malware
classification: bridging the static/dynamic gap. In Proceedings of the 5th ACM
workshop on Security and artificial intelligence (pp. 3-14). ACM. Koret, Joxean
& Bachaalany, Elias (2015).
9. Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A., ... &
Vigna, G. (2016, May). Sok:(state of) the art of war: Offensive techniques in
binary analysis. In Security and Privacy (SP), 2016 IEEE Symposium on (pp. 138-
157). IEEE.
10. Ravi, C., & Manoharan, R. (2012). Malware detection using windows api
sequence and machine learning. International Journal of Computer
Applications, 43(17), 12-16.
65. Q&A
How different are equivalent types of memory artifacts (
originating from identical binaries) across multiple hosts?
• The theory and the empirical results* suggest that memory artifacts
are almost identical (+99% similar based on empirical results)
• *Under the following assumptions
i. The hosts are not under significant memory pressure
ii. Identical versions of the host operating system
66. Q&A
Can we use hashes to determine if memory artifacts across
hosts are identical (e.g. identical explorer.exe process
artifacts on Hosts A & B)?
• No. Chunks of the binary may not be in memory because the OS has
likely paged those sections to disk in order to efficiently
utilize/manage memory
• Also, the binary is likely memory mapped. Therefore, the OS may
take a lazy approach by loading a particular chunk when needed
• As a consequence, the binary on-disk will be different then its image
that resides in memory. Memory artifacts across hosts are very
likely to also be different.
69. MACHINE LEARNING
Key Concepts
• Unsupervised Learning: Inferring a function to describe hidden
structure from "unlabeled" data
• Supervised Learning: Inferring a function from labeled training
data
• Clustering: Grouping a set of objects in such a way that objects in
the same group (called a cluster) are more similar
• Classification: Identifying to which of a set of categories (sub-
populations) a new observation belongs
70. MACHINE LEARNING
Challenges/Steps
1. (Non-Trivial) Representing observed/collected data in a
meaningful mathematical expressions (e.g. vector)
2. Deciding on a metric for measuring the similarity of
observations (e.g. Jaccard Similarity function)
3. Selecting a suitable algorithm that can classify and/or cluster
observations appropriately using a supervised or
unsupervised approach (e.g. agglomerative hierarchical
clustering)
72. MACHINE LEARNING
1. Representing Observations Mathematically
• We’ll represent the contents of each square as a vector where each
dimension represents the number of occurrences of a color
• Vector representations
Pink
Green
Blue
0
0
3
1
1
1
0
2
1
73. MACHINE LEARNING
2. Metric for measuring similarity of observations
• We’ll use the Jaccard Index to measure similarity
• Example:
S2 S3
74. MACHINE LEARNING
3. Selecting a suitable algorithm
• We’ll use a hierarchical clustering algorithm
• Depending on the input parameters of the algorithm,
the clusters could look like the following
S1 S2 S3
76. BINARY ANALYSIS
Static Analysis: Analysis of computer software that
is performed without the actual execution of the
software code
Dynamic Analysis: Execution of software in an
instrumented or monitored manner to garner more
concrete information on behavior
77. BINARY ANALYSIS
Static vs. Dynamic Analysis
• Static analysis scales well and can provide better code
coverage of a binary
• Dynamic analysis can provide more accurate information
on the actual execution behavior of a binary
• Static analysis can produce false execution behavior as
code paths may not be reachable during actual execution
• Dynamic analysis can be computationally expensive
78. BINARY ANALYSIS
Advanced analysis techniques
• Symbolic Execution: Analysis of a program to determine the
necessary inputs needed to reach a particular code path.
Variables modeled as symbols
• Concolic Execution: Used in conjunction with symbolic
execution to generate concrete inputs (test cases) from
symbolic variables to feed into program
• Selective Concolic Execution: Selectively leverage concolic
execution when fuzzing engine gets “stuck” (i.e. unable to
generate inputs that can traverse a desired code path)
79. BINARY ANALYSIS
Motivational example for Symbolic Execution
int main(void) {
char buf[32];
char *data = read_string();
unsigned int magic = read_number();
// difficult check for fuzzing
if (magic == 0x31337987) {
// Bad stuff
doBadStuff();
}
else if(magic < 100 && magic % 15 == 2 && magic % 11 == 6) {
// Only solution is 17;
doReallyBadStuff();
}
else{
doBenignStuff();
}
80. BINARY ANALYSIS
Motivational example for Symbolic Execution
int main(void) {
char buf[32];
char *data = read_string();
unsigned int magic = read_number();
// difficult check for fuzzing
if (magic == 0x31337987) {
// Bad stuff
doBadStuff();
}
else if(magic < 100 && magic % 15 == 2 && magic % 11 == 6) {
// Only solution is 17;
doReallyBadStuff();
}
else{
doBenignStuff();
}
Symbolic execution allows us
to figure out the conditions
(i.e. magic=0x31337987) to
exercise this code path
A more sophisticated code path that
can be reached via symbolic execution
81. BINARY ANALYSIS
Analyzing the call traces of example
• Call Trace A: Likely result if utilizing traditional emulation
techniques to analyze sophisticated malware
• Call Trace B: The more useful trace for identifying
potential malicious behavior of a binary as a result of
applying advanced binary analysis techniques
main()
doBenignStuff()
Call Trace A
main()
doReallyBadStuff()
Call Trace B
83. MEMORY FORENSICS
Defined: Analysis of a computer’s memory dump
Memory Acquisition
• Refers to the process of accessing the physical memory
• Most critical step in the memory forensics process
• Software and hardware tools can be used during acquisition,
but we’ll focus on the former
Rekall and Lime provide open source acquisition tools
Volatility and Rekall provide open source analysis tools
85. Virtual Addressing and Memory Acquisition (cont’d)
Physical space may be smaller than virtual address space.
Less recently used memory blocks (a.k.a. pages) are moved to disk
Important: Only data that is in physical memory during acquisition
can be acquired; paged data is unavailable
MEMORY FORENSICS
Process A Physical Memory
0x1FF9612
0x12467
Disk
86. MEMORY FORENSICS
Anti-Forensics
• Any attempt to compromise the availability or usefulness of evidence to
the forensic process
• Techniques include
i. Substitution Attack: Data fabricated by the attacker is substituted in place of
valid data during the acquisition
ii. Disruption Attack: Disrupt the acquisition process
• Proof of Concepts:
i. “ShadowWalker” @ Blackhat 2005
ii. “Low Down and Dirty” @ Blackhat 2006
iii. “Defeating Windows Forensics” @Fahrplan 2012
• The presented approach is resilient to anti-forensics techniques due to
information asymmetry on the side of the defender
88. BINARY VECTOR GENERATION
Binary Vector Generation
i. Load the dumped binary into a disassembler (e.g. IDA or Binary Ninja)
and extract the executable portion of binary
ii. Lift the executable portion of binary to a RISC-like intermediate
representation (ir)
iii. Create a binary behavior report that categorizes each ir statement
iv. Generate a set of fixed-sized linear sequences (w.r.t. address space) of
ir statements that will be referred to as behavior sequences
v. Create a map that maps each unique sequence to a row in a vector
Disassembler
RISC-like
instructions
Behavior
Report
Behavior
Sequences
89. BINARY VECTOR GENERATION
main()
push ebp
mov rbp, rsp
sub rsp, 0x10
reg_access
reg_access
arithmetic
reg_access
store
reg_access
reg_access
reg_access
other
arithmetic
reg_access
t0 = GET:I64(bp)
t3 = GET:I64(rsp)
t2 = Sub64(t3, 0x008)
PUT(rsp) = t2
Stle(t1) = t0
t3 = GET:I64(rsp)
PUT(bp) = t3
t4 = GET:I64(rsp)
t5 = 0x10
t6 = Sub64(t4,t5)
PUT(rsp) = t6
Report
Disassembler
RISC-like
Instructions
Behavior
Report
RISC-like (vex ir )
96. BINARY VECTOR GENERATION
Total possible behavior sequence permutations: ~ 277
• Number of instruction categories (e.g. store and branch): 14
• Length of behavior sequence: 20
• 1420 ~ 277
Naive Approach
• Create a 1420 dimensional vector to express binary behavior
• Each vector dimension maps to a unique behavior sequence
• Number stored at dimension k is the number of times
behavior sequence occurs in executable
97. BINARY VECTOR GENERATION
Naive Approach (continued…)
Behavior Vector
Index 00
1
20 category
items
call
call
call
call
……….
call
Index 1420 -1
reg_access
reg_access
reg_access
reg_access
…….
reg_access
k
1420-1
……………………………..……………………………..
xxx
xxx
xxx
xxx
……….
xxx
Index k
3
6
m
0
Number of occurrences m of
behavior sequence k
98. BINARY VECTOR GENERATION
Naive Approach (continued…)
• Not very practical to implement directly
• Fortunately, we can do better
Key observation: Vector is sparse in that most of the
dimensions will store the number ‘0’
Better Approach
• Only store the non-zero elements in memory
• So if a binary has N total ir statements, then we only need to
store at most N-window_length elements in memory
100. EFFICIENT DIFFING ALGOIRHTM
Selecting a similarity function
• For convenience, we’ll use the Jaccard Index
• The Jaccard Index has the following interpretation:
(See Appendix A for a simple example using the Jaccard Index)
explorerA.exe explorerB.exe
Similarity
Function
explorerA.exe
.867
Similarity function takes as input two vectors and produces a value between 0 and 1
explorerB.exe
Step 2: Compute the similarity of a pair of vectors
102. CALL TRACE VECTOR GENERATION
Call Trace Vector Generation
i. Load the dumped binary into a disassembler (e.g. IDA or Binary
Ninja) and extract all sections of the binary
ii. Lift the executable portion of binary to a RISC-like intermediate
representation (ir)
iii. Perform Dynamic analysis on the ir to generate call traces
iv. Generate a call trace vector that maps unique call trace sequences
into a row of the vector
Disassembler
RISC-like
instructions
Call Trace
Generation
(Dyn. Analysis)
Call Trace
Vector
108. CALL TRACE VECTOR GENERATION
Trace Call Vector
Trace Call Vector
Index 00
1
….
call
call
call
call
……….
call
Index 1420 -1
main()
LoadLibraryW()
RegisterShellHook()
k
2128-1
……………………………..……………………………..
xxx
xxx
xxx
xxx
……….
xxx
Index k
3
6
m
0
Number of occurrences m of
trace sequence k
111. HOOKING
Monitoring Behavior w/ Hooks
• Detouring a number of common APIs (e.g. CreateFile) to
ensure monitoring code is executed before actual code
• Depending on a set of rules ( typically dynamic), the monitor
code blocks, allows, or reports execution of API
CreateFileA (Original)
mov edi, edi
push ebp
mov ebp, esp
CreateFileA (Hooked)
jmp MonitorCode
push ebp
mov ebp, esp
Monitor Code
inc dword ptr [edx+8]
.......
jmp CreateFileA + 5
jmp MonitorCode
Modified
Instruction