Introduction Academic Contributions Moving Forward Conclusions
How do we detect malware?
A step-by-step guide
Marcus Botacin
1botacin@tamu.edu
marcusbotacin.github.io
How do we detect malware? 1 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Who Am I?
Assistant Professor (2022) - Texas A&M University (TAMU), USA
ACES Program Fellowship
PhD. in Computer Science (2021) - Federal University of Paraná (UFPR), Brazil
Thesis: “On the Malware Detection Problem: Challenges and new Approaches”
MSc. in Computer Science (2017) - University of Campinas (UNICAMP), Brazil
Dissertation: “Hardware-Assisted Malware Analysis”
Computer Engineer (2015) - University of Campinas (UNICAMP), Brazil
Final Project: “Malware detection via syscall patterns identification”
How do we detect malware? 2 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 3 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
The Malware Problem
How do we detect malware? 4 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
How have we been doing? (Overall)
The good side
Figure: https://www.paysafe.com/en/blo
g/do-consumers-trust-online-payments
-more-now-than-before-covid-19/
The bad side
Figure: https://www.ncr.com/blogs/paym
ents/credit-card-fraud-detection
How do we detect malware? 5 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
How have we been doing? (Malware Specifics)
The good side
Figure:
https://apnews.com/article/europe-ma
lware-netherlands-coronavirus-pandem
ic-7de5f74120a968bd0a5bee3c57899fed
The bad side
Figure:
https://thehackernews.com/2021/06/dr
oidmorph-shows-popular-android.html
How do we detect malware? 6 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 7 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
How Do We Detect Malware?
How do we detect malware? 8 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
The State-of-the-art in Malware Detection & Prevention
Steps
1 Collection
2 Triage
3 Sandbox Analysis
4 Threat Intelligence
5 Endpoint Protection
Distributed Processing
Collection
Cloud Processing
Analysis and Intelligence steps
Limited Processing
Endpoint
How do we detect malware? 9 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Collection
How to find new malware samples?
Searching “dark web” forums.
Crawling software repositories.
Leveraging honeypots.
Checking spam traps.
Downloading Malware repositories.
Scrapping blocklists.
The result
Figure: https://www.forbes.com/sites/t
homasbrewster/2021/09/29/google-play
-warning-200-android-apps-stole-mi
llions-from-10-million-phones/
How do we detect malware? 10 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Triage
Why how many new malware samples?
Variations from the same source
code.
Implications
Increase processing costs and
response time.
How to solve this problem?
Identify and cluster similar samples.
The Statistics
Figure:
https://www.kaspersky.com/about/pres
s-releases/2020 the-number-of-new-m
alicious-files-detected-every-day-
increases-by-52-to-360000-in-2020
How do we detect malware? 11 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Sandbox Analysis
Goals
Uncover hidden
behaviors.
Method
Trace sample
execution.
Challenge
Handle evasion
attempts.
Solution 1
Figure: https://blog.vir
ustotal.com/2019/05/vi
rustotal-multisandbox-
yoroi-yomi.html
Solution 2
Figure: https:
//blog.virustotal.com/
2019/07/virustotal-mul
tisandbox-sndbox.html
How do we detect malware? 12 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Threat Intelligence
Goal
Identify trends and predict attacks.
How?
Data analytics over analyzed
samples.
Challenges
Look to a representative dataset.
We should look to:
Figure: https://www.computerweekly.com
/news/252504676/Ransomware-attacks-i
ncrease-dramatically-during-2021
How do we detect malware? 13 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Endpoint Protection
Goal
Protect customers in their machines.
How?
Moving the viable analyses to the
endpoint.
Challenges
Performance and usability
constraints.
Is there a “best”?
Figure: https://www.av-test.org/en/ant
ivirus/home-windows/
How do we detect malware? 14 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 15 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Enhancing Malware Triage
How do we detect malware? 16 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
The good side: Separating Code and Data
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Accuracy
(%)
AV Clustering Accuracy vs Similarity Score
All Text Data
Figure: Binary Sections Accuracy
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Recall
(%)
AV Clustering Recall vs Similarity Score
All Text Data
Figure: Binary Sections Recall
Source: https://www.sciencedirect.com/science/article/abs/pii/S26662
81721001281
How do we detect malware? 17 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
The bad side: Packed Samples
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Samples
(%)
The Impact of Packing on Sample's Similarity
Packed Unpacked Identical
Figure: The impact of UPX packing.
Packing reduces sample’s similarity scores.
UPX Packing
UPX Packing
Similar Not Similar
Not Similar
Not Similar
Similar
Unpacked 1 Packed 1
Packed 2
Unpacked 2
Figure: Average Packed Sample’s
Similarity Scheme. Cross-comparisons
should be avoided.
How do we detect malware? 18 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Enhancing Malware Tracing
How do we detect malware? 19 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Software-based Sandbox
Figure: System Architecture.
Link: https://link.springer.com/article/10.1007/s11416-017-0292-8
How do we detect malware? 20 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Drawbacks: Anti-VM
Technique Description Detection
VM Fingerprint
Check for known strings,
such as serial numbers
Check for known strings
inside the binary
CPUID Check Check CPU vendor
Check for known CPU
vendor strings
Invalid Opcodes
Launch hypervisor-specific
instructions
Check for specific instrutions
on the binary
System Table Checks Compare IDT values Look for checks involving IDT
HyperCall Detection Platform specific feature Look for specific instructions
How do we detect malware? 21 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Hardware-based Sandbox
Monitoring Steps
1 Software executes a branch.
2 Processor stores branch address in
memory page.
3 Processor raises an interrupt.
4 Kernel handles interrupt.
5 Kernel sends data to userland.
6 Userland introspects into this data.
Figure: System Architecture.
How do we detect malware? 22 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Key Insight: Branches define basic blocks
Figure: Identified branches and basic blocks..
Source: https://dl.acm.org/doi/10.
1145/3152162
Figure: CFG Reconstruction.
How do we detect malware? 23 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
From Tracing to Threat Intelligence
How do we detect malware? 24 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware on Desktop
Figure: Passive Banker Malware for
Santander bank waiting for user’s
credential input.
Figure: Passive Banker Malware for Itaú bank
waiting for user’s credential input.
Link: https://dl.acm.org/doi/10.1145/3429741
How do we detect malware? 25 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware on Mobile
Figure: BB’s Whatsapp chatbot. Figure: Bradesco’s Whatsapp chatbot.
Link: https://dl.acm.org/doi/10.1145/3339252.3340103
How do we detect malware? 26 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware Filetypes.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2012 2013 2014 2015 2016 2017 2018
Samples
(%)
Year
Evolution of threat’s filetype
PE
CPL
.NET
DLL
JAR
JS
VBE
Brazilian malware filetypes.
Varied file formats are prevalent
over the years.
How do we detect malware? 27 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
More about Brazilian Malware
Figure: Source:
https://www.usenix.org/conference/enigma2021/presentation/botacin
How do we detect malware? 28 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
From Threat Intelligence to Endpoint
Protection
How do we detect malware? 29 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Drawback: Real-time monitoring performance penalty
0
50
100
150
200
250
Perl Xalanc Gobmk H264 Namd Mcf
Time
(s)
Benchmark
AV’s Monitoring Performance
Filter AV SSDT AV No AV
Figure: AV Monitoring Performance.
0
50
100
150
200
250
300
perl namd Bzip milc mfc
Execution
Time
(s)
Benchmark
AV scanning overhead
Scan
Baseline
Figure: In-memory AV scans worst-case
and best-case performance penalties.
How do we detect malware? 30 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Hardware AV Architecture
2-level Architecture
Do not fully replace AVs, but add effi-
cient matching capabilities to them.
How do we detect malware? 31 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Performance Characterization
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
5 10 15 20 25 30 35 40
CPU
(%)
Time (s)
AV Monitoring Overhead
HEAVEN+AV
AV
No−AV
2-Phase HEAVEN CPU Performance
The inspection phase causes occasional,
and quick bursts of CPU usage. The AV
operating alone incurs a continuous 10%
performance overhead.
How do we detect malware? 32 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
A first idea: Hardware features as signatures
Figure: Two-level branch predictor. A
sequence window of taken (1) and not-taken
(0) branches is stored in the Global History
Register (GHR).
0
10
20
30
40
50
60
70
80
90
100
8 16 24 32 40
Percentage
of
signature
collision
in
the
k−bit
space
Branch pattern length (in k bits)
Percentage of signature collision per branch−pattern length (in bits)
Patterns
Figure: Branch patterns coverage.
How do we detect malware? 33 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Result: Performance penalty reduction
1×108
1×109
1×1010
1×1011
1×10
12
1×10
13
1×10
14
blender nab roms bwaves djeng perl cam4 cactusomnetpp mcf wrf x264 xzr leela parest lbm namd imagick povray xalanc gcc echg2
Cycles
(logscale)
Benchmark
AV’s Performance Overhead
AVSW
AVHW
BASE
Figure: Performance evaluation when tracking all function calls. Comparison between
execution without AV (BASE), execution with software AV, and execution with the proposed
coprocessor model.
How do we detect malware? 34 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 35 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Deep Learning:
From Images to Binaries
How do we detect malware? 36 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Malware Binaries as Textures
Figure: Source: https://link.springer.com/chapter/10.1007/978-3-030-30215-3 19
How do we detect malware? 37 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Machine Learning
Detection Bypasses
How do we detect malware? 38 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Machine Learning
Figure: Source: https://github.com/marcusbotacin/Talks/tree/master/Waikato
How do we detect malware? 39 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Malware
Figure: Dropper Strategy. Figure: Data Appendix Result.
How do we detect malware? 40 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
ML Evasion Contest
Figure: mlsec.io
Figure: https://cujo.com/machine-learn
ing-security-evasion-competition-202
0-results-and-behind-the-scenes/
How do we detect malware? 41 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Transition to Practice:
Analysis Platforms
How do we detect malware? 42 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
A Current Public Malware Analysis Platform
Figure: https://app.any.run
How do we detect malware? 43 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 44 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Summary
Malware Detection
No definitive solution, but a pipeline of attempts.
World is better with some approximation of security.
Academic Contributions
Better Triage with Similarity Hashing
Better Analyses with new Sandboxes
Better Threat Intelligence for Brazilian Malware.
Better endpoint protection with Hardware AVs
Moving Forward
Open research positions. Get in touch!
How do we detect malware? 45 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Thanks!
Questions? Comments?
@MarcusBotacin
botacin@tamu.edu
marcusbotacin.github.io
How do we detect malware? 46 / 46 TAMU

How do we detect malware? A step-by-step guide

  • 1.
    Introduction Academic ContributionsMoving Forward Conclusions How do we detect malware? A step-by-step guide Marcus Botacin 1botacin@tamu.edu marcusbotacin.github.io How do we detect malware? 1 / 46 TAMU
  • 2.
    Introduction Academic ContributionsMoving Forward Conclusions Who Am I? Assistant Professor (2022) - Texas A&M University (TAMU), USA ACES Program Fellowship PhD. in Computer Science (2021) - Federal University of Paraná (UFPR), Brazil Thesis: “On the Malware Detection Problem: Challenges and new Approaches” MSc. in Computer Science (2017) - University of Campinas (UNICAMP), Brazil Dissertation: “Hardware-Assisted Malware Analysis” Computer Engineer (2015) - University of Campinas (UNICAMP), Brazil Final Project: “Malware detection via syscall patterns identification” How do we detect malware? 2 / 46 TAMU
  • 3.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 3 / 46 TAMU
  • 4.
    Introduction Academic ContributionsMoving Forward Conclusions Malware The Malware Problem How do we detect malware? 4 / 46 TAMU
  • 5.
    Introduction Academic ContributionsMoving Forward Conclusions Malware How have we been doing? (Overall) The good side Figure: https://www.paysafe.com/en/blo g/do-consumers-trust-online-payments -more-now-than-before-covid-19/ The bad side Figure: https://www.ncr.com/blogs/paym ents/credit-card-fraud-detection How do we detect malware? 5 / 46 TAMU
  • 6.
    Introduction Academic ContributionsMoving Forward Conclusions Malware How have we been doing? (Malware Specifics) The good side Figure: https://apnews.com/article/europe-ma lware-netherlands-coronavirus-pandem ic-7de5f74120a968bd0a5bee3c57899fed The bad side Figure: https://thehackernews.com/2021/06/dr oidmorph-shows-popular-android.html How do we detect malware? 6 / 46 TAMU
  • 7.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 7 / 46 TAMU
  • 8.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection How Do We Detect Malware? How do we detect malware? 8 / 46 TAMU
  • 9.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection The State-of-the-art in Malware Detection & Prevention Steps 1 Collection 2 Triage 3 Sandbox Analysis 4 Threat Intelligence 5 Endpoint Protection Distributed Processing Collection Cloud Processing Analysis and Intelligence steps Limited Processing Endpoint How do we detect malware? 9 / 46 TAMU
  • 10.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Collection How to find new malware samples? Searching “dark web” forums. Crawling software repositories. Leveraging honeypots. Checking spam traps. Downloading Malware repositories. Scrapping blocklists. The result Figure: https://www.forbes.com/sites/t homasbrewster/2021/09/29/google-play -warning-200-android-apps-stole-mi llions-from-10-million-phones/ How do we detect malware? 10 / 46 TAMU
  • 11.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Triage Why how many new malware samples? Variations from the same source code. Implications Increase processing costs and response time. How to solve this problem? Identify and cluster similar samples. The Statistics Figure: https://www.kaspersky.com/about/pres s-releases/2020 the-number-of-new-m alicious-files-detected-every-day- increases-by-52-to-360000-in-2020 How do we detect malware? 11 / 46 TAMU
  • 12.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Sandbox Analysis Goals Uncover hidden behaviors. Method Trace sample execution. Challenge Handle evasion attempts. Solution 1 Figure: https://blog.vir ustotal.com/2019/05/vi rustotal-multisandbox- yoroi-yomi.html Solution 2 Figure: https: //blog.virustotal.com/ 2019/07/virustotal-mul tisandbox-sndbox.html How do we detect malware? 12 / 46 TAMU
  • 13.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Threat Intelligence Goal Identify trends and predict attacks. How? Data analytics over analyzed samples. Challenges Look to a representative dataset. We should look to: Figure: https://www.computerweekly.com /news/252504676/Ransomware-attacks-i ncrease-dramatically-during-2021 How do we detect malware? 13 / 46 TAMU
  • 14.
    Introduction Academic ContributionsMoving Forward Conclusions Malware Detection Endpoint Protection Goal Protect customers in their machines. How? Moving the viable analyses to the endpoint. Challenges Performance and usability constraints. Is there a “best”? Figure: https://www.av-test.org/en/ant ivirus/home-windows/ How do we detect malware? 14 / 46 TAMU
  • 15.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 15 / 46 TAMU
  • 16.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Enhancing Malware Triage How do we detect malware? 16 / 46 TAMU
  • 17.
    Introduction Academic ContributionsMoving Forward Conclusions Examples The good side: Separating Code and Data 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Accuracy (%) AV Clustering Accuracy vs Similarity Score All Text Data Figure: Binary Sections Accuracy 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Recall (%) AV Clustering Recall vs Similarity Score All Text Data Figure: Binary Sections Recall Source: https://www.sciencedirect.com/science/article/abs/pii/S26662 81721001281 How do we detect malware? 17 / 46 TAMU
  • 18.
    Introduction Academic ContributionsMoving Forward Conclusions Examples The bad side: Packed Samples 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Samples (%) The Impact of Packing on Sample's Similarity Packed Unpacked Identical Figure: The impact of UPX packing. Packing reduces sample’s similarity scores. UPX Packing UPX Packing Similar Not Similar Not Similar Not Similar Similar Unpacked 1 Packed 1 Packed 2 Unpacked 2 Figure: Average Packed Sample’s Similarity Scheme. Cross-comparisons should be avoided. How do we detect malware? 18 / 46 TAMU
  • 19.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Enhancing Malware Tracing How do we detect malware? 19 / 46 TAMU
  • 20.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Software-based Sandbox Figure: System Architecture. Link: https://link.springer.com/article/10.1007/s11416-017-0292-8 How do we detect malware? 20 / 46 TAMU
  • 21.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Drawbacks: Anti-VM Technique Description Detection VM Fingerprint Check for known strings, such as serial numbers Check for known strings inside the binary CPUID Check Check CPU vendor Check for known CPU vendor strings Invalid Opcodes Launch hypervisor-specific instructions Check for specific instrutions on the binary System Table Checks Compare IDT values Look for checks involving IDT HyperCall Detection Platform specific feature Look for specific instructions How do we detect malware? 21 / 46 TAMU
  • 22.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Hardware-based Sandbox Monitoring Steps 1 Software executes a branch. 2 Processor stores branch address in memory page. 3 Processor raises an interrupt. 4 Kernel handles interrupt. 5 Kernel sends data to userland. 6 Userland introspects into this data. Figure: System Architecture. How do we detect malware? 22 / 46 TAMU
  • 23.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Key Insight: Branches define basic blocks Figure: Identified branches and basic blocks.. Source: https://dl.acm.org/doi/10. 1145/3152162 Figure: CFG Reconstruction. How do we detect malware? 23 / 46 TAMU
  • 24.
    Introduction Academic ContributionsMoving Forward Conclusions Examples From Tracing to Threat Intelligence How do we detect malware? 24 / 46 TAMU
  • 25.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Brazilian Financial Malware on Desktop Figure: Passive Banker Malware for Santander bank waiting for user’s credential input. Figure: Passive Banker Malware for Itaú bank waiting for user’s credential input. Link: https://dl.acm.org/doi/10.1145/3429741 How do we detect malware? 25 / 46 TAMU
  • 26.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Brazilian Financial Malware on Mobile Figure: BB’s Whatsapp chatbot. Figure: Bradesco’s Whatsapp chatbot. Link: https://dl.acm.org/doi/10.1145/3339252.3340103 How do we detect malware? 26 / 46 TAMU
  • 27.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Brazilian Financial Malware Filetypes. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2012 2013 2014 2015 2016 2017 2018 Samples (%) Year Evolution of threat’s filetype PE CPL .NET DLL JAR JS VBE Brazilian malware filetypes. Varied file formats are prevalent over the years. How do we detect malware? 27 / 46 TAMU
  • 28.
    Introduction Academic ContributionsMoving Forward Conclusions Examples More about Brazilian Malware Figure: Source: https://www.usenix.org/conference/enigma2021/presentation/botacin How do we detect malware? 28 / 46 TAMU
  • 29.
    Introduction Academic ContributionsMoving Forward Conclusions Examples From Threat Intelligence to Endpoint Protection How do we detect malware? 29 / 46 TAMU
  • 30.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Drawback: Real-time monitoring performance penalty 0 50 100 150 200 250 Perl Xalanc Gobmk H264 Namd Mcf Time (s) Benchmark AV’s Monitoring Performance Filter AV SSDT AV No AV Figure: AV Monitoring Performance. 0 50 100 150 200 250 300 perl namd Bzip milc mfc Execution Time (s) Benchmark AV scanning overhead Scan Baseline Figure: In-memory AV scans worst-case and best-case performance penalties. How do we detect malware? 30 / 46 TAMU
  • 31.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Hardware AV Architecture 2-level Architecture Do not fully replace AVs, but add effi- cient matching capabilities to them. How do we detect malware? 31 / 46 TAMU
  • 32.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Performance Characterization 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 5 10 15 20 25 30 35 40 CPU (%) Time (s) AV Monitoring Overhead HEAVEN+AV AV No−AV 2-Phase HEAVEN CPU Performance The inspection phase causes occasional, and quick bursts of CPU usage. The AV operating alone incurs a continuous 10% performance overhead. How do we detect malware? 32 / 46 TAMU
  • 33.
    Introduction Academic ContributionsMoving Forward Conclusions Examples A first idea: Hardware features as signatures Figure: Two-level branch predictor. A sequence window of taken (1) and not-taken (0) branches is stored in the Global History Register (GHR). 0 10 20 30 40 50 60 70 80 90 100 8 16 24 32 40 Percentage of signature collision in the k−bit space Branch pattern length (in k bits) Percentage of signature collision per branch−pattern length (in bits) Patterns Figure: Branch patterns coverage. How do we detect malware? 33 / 46 TAMU
  • 34.
    Introduction Academic ContributionsMoving Forward Conclusions Examples Result: Performance penalty reduction 1×108 1×109 1×1010 1×1011 1×10 12 1×10 13 1×10 14 blender nab roms bwaves djeng perl cam4 cactusomnetpp mcf wrf x264 xzr leela parest lbm namd imagick povray xalanc gcc echg2 Cycles (logscale) Benchmark AV’s Performance Overhead AVSW AVHW BASE Figure: Performance evaluation when tracking all function calls. Comparison between execution without AV (BASE), execution with software AV, and execution with the proposed coprocessor model. How do we detect malware? 34 / 46 TAMU
  • 35.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 35 / 46 TAMU
  • 36.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Deep Learning: From Images to Binaries How do we detect malware? 36 / 46 TAMU
  • 37.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Malware Binaries as Textures Figure: Source: https://link.springer.com/chapter/10.1007/978-3-030-30215-3 19 How do we detect malware? 37 / 46 TAMU
  • 38.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Adversarial Machine Learning Detection Bypasses How do we detect malware? 38 / 46 TAMU
  • 39.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Adversarial Machine Learning Figure: Source: https://github.com/marcusbotacin/Talks/tree/master/Waikato How do we detect malware? 39 / 46 TAMU
  • 40.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Adversarial Malware Figure: Dropper Strategy. Figure: Data Appendix Result. How do we detect malware? 40 / 46 TAMU
  • 41.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities ML Evasion Contest Figure: mlsec.io Figure: https://cujo.com/machine-learn ing-security-evasion-competition-202 0-results-and-behind-the-scenes/ How do we detect malware? 41 / 46 TAMU
  • 42.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities Transition to Practice: Analysis Platforms How do we detect malware? 42 / 46 TAMU
  • 43.
    Introduction Academic ContributionsMoving Forward Conclusions Research Opportunities A Current Public Malware Analysis Platform Figure: https://app.any.run How do we detect malware? 43 / 46 TAMU
  • 44.
    Introduction Academic ContributionsMoving Forward Conclusions Recap & Remarks Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 44 / 46 TAMU
  • 45.
    Introduction Academic ContributionsMoving Forward Conclusions Recap & Remarks Summary Malware Detection No definitive solution, but a pipeline of attempts. World is better with some approximation of security. Academic Contributions Better Triage with Similarity Hashing Better Analyses with new Sandboxes Better Threat Intelligence for Brazilian Malware. Better endpoint protection with Hardware AVs Moving Forward Open research positions. Get in touch! How do we detect malware? 45 / 46 TAMU
  • 46.
    Introduction Academic ContributionsMoving Forward Conclusions Recap & Remarks Thanks! Questions? Comments? @MarcusBotacin botacin@tamu.edu marcusbotacin.github.io How do we detect malware? 46 / 46 TAMU