Multi level ransomware analysis MALCON 2019 conference

October 1 – 4 2019
A Multi-Level Ransomware Detection
Framework using Natural Language
Processing and Machine Learning
By: Subash Poudyal, Dipankar Dasgupta,
Zahid Akhtar, Kishor Datta Gupta

October 1 – 4 2019
About Me
 Subash Poudyal @subash_spp
 Security Researcher at Center for Information
Assurance (CfIA)
 PhD Student at University of Memphis
2

October 1 – 4 2019
Purpose
 Ransomware detection Framework
 Multi-level (DLL, Function call, Assembly) feature
mining
 Use NLP and Machine learning approaches
 Apache Spark for feature processing
 PE parser and Objectdump tool of Linux system
 N-gram probability, Term Frequency- Inverse
Document Frequency (TF-IDF) from NLP
3

October 1 – 4 2019
Background Information
 Ransomware is evolving and causing damage.
 Advanced malware- encrypts your data, asks for
ransom in bitcoins for anonymity
4
[9]

October 1 – 4 2019
Some recent incidents
5
Georgia county pays a
whopping $400,000 to get rid of
a ransomware infection
Mar. 2019
Florida, Riviera and Lake city
council agreed to pay $600,000
and $500,000, respectively, to get
their data back
June 2019
South Africa City power grid
attack
July 2019
22 municipalities/local governments
in Texas , demand of 2.5M
Community hospital at Washington,
$1 million demanded
August 2019
[5,6,8,7,2]

October 1 – 4 2019
Damage caused by Ransomware
6
[3]

October 1 – 4 2019
Map of Ransomware detection
7
[10]
Ransomware detections across organizations in USA from Jan- Aug 2019

October 1 – 4 2019
PHISHING EMAIL
Run Payload and Download
Code
Generate Key and Encrypt
Communicate with
C&C
Select different file types for
encryption
Display message in Victim
Desktop
Ransomware attack steps

October 1 – 4 2019
Previous work on Malware detection
 Canzanese et al. [11] analyzed system call traces
utilizing n-gram language model and TF-IDF for Malware
detection
 Zhang et al. [27] used n-gram of opcode sequences for
ransomware family classification
 Yuki et al. [12] ] have proposed ransomware detection by
using API calls and SVM
 Poudyal et al. [12] used DLL and assembly instructions
frequencies for ransomware detection
 Difference: They study one level or other, but we deal
with three level 9

October 1 – 4 2019
Hypothesis
 We can detect ransomware with improved
accuracy (98.59% and 0.03 FPR)
 By reverse engineering, mining of multi-level features
 By leveraging NLP and ML techniques
 The previous approaches have adopted a single
approach
10

October 1 – 4 2019
Details of DLL Hierarchy
11

October 1 – 4 2019
Details of Function calls Hierarchy
12

October 1 – 4 2019
DLL and Function call level code segment
of Locky ransomware
13

October 1 – 4 2019
Assembly level code segment of Locky
ransomware
14

October 1 – 4 2019
The Proposed Multi-level Framework
15

October 1 – 4 2019
The Proposed Framework
 DLL tracker analyzes DLLs of a given binary using
the detection engine
 Function call tracker analyzes function calls of a
binary
 Assembly instruction tracker works similar to DLL
and function call trackers
• detection counter checked
 Action engine
 passive analyzer
16

October 1 – 4 2019
Workflow of Detector Engine
17

October 1 – 4 2019
Workflow of Detector Engine
 Reverse-engineer using
PE parser and Objdump
 Multi-level mining using
three different extractors
1. Dll extractor
2. Function call extractor
3. Assembly instruction extractor
18

October 1 – 4 2019
NLP Schemes
 Proved useful in recommendation system, text
classification, speech recognition and so on
 N-gram Generator: unique set of n-gram sequences
 Markov assumption by considering only the
immediate N-1 words
19

October 1 – 4 2019
NLP Schemes
20
 n-gram probability: relative frequency on a training
corpus
 TF-IDF: Term Freq X Inverse document Freq

October 1 – 4 2019
Experimental Setup
 Dataset: Virus Total and open source malware
repository theZoo
 292 only ransomware binaries and the same number
of benign executables
 Apache spark cluster configuration:
 4 data nodes
 1 name node each with 16GB RAM and 8 cores
 Ubuntu 16.04.3 operating system with1TB disk
 Hadoop version-2.7.3
 Spark-2.3
 Experimental programs written in Python, Mlib library
from Pyspark
21

October 1 – 4 2019
Experiment
 Feature generation using N-gram language model
22

October 1 – 4 2019
Experiment Cont..
23

October 1 – 4 2019
Experiment Cont..

October 1 – 4 2019
 Logistic regression accuracy for N-gram TF-IDF at
multi and combined level
25

October 1 – 4 2019
Trigram analysis
26

October 1 – 4 2019
Impact & Broader Contributions
 Provide new effective approach of
ransomware detection
 Implemented multi-level features for improved
detection
 Provide background for further analysis in
multi-level relation mapping
27

October 1 – 4 2019
Conclusion
 An efficient multi-level ransomware detection
framework (98.59% and 0.03 FPR)
 Leveraged reverse-engineering, data mining, NLP
and supervised ML techniques
 Practical implementation feasible
28

October 1 – 4 2019
Future Work & Remaining Questions
 Multi-level analysis leveraging deep learning
techniques using larger dataset
 Performance comparison between relevant
techniques
 Effect of code obfuscation techniques in machine
learning detection
 We welcome any collaboration with industry or
university on ransomware research
29

October 1 – 4 2019
30
Thanks
Questions and comments?
Subash Poudyal
@subash_spp
connect.subash@gmail.com

October 1 – 4 2019
31
References
[1] https://www.techspot.com/news/79119-jackson-county-government-gives-hackers-pays-400000.html
[2] https://healthitsecurity.com/news/hackers-demand-1m-in-grays-harbor-ransomware-attack
[3] https://heimdalsecurity.com/blog/cyber-security-threats-types/
[4] https://Malwarebytes.com
[5] https://www.zdnet.com/article/georgia-county-pays-a-whopping-400000-to-get-rid-of-a-ransomware-infection/
[6] https://www.zdnet.com/article/second-florida-city-pays-giant-ransom-to-ransomware-gang-in-a-week/
[7] https://www.npr.org/2019/08/20/752695554/23-texas-towns-hit-with-ransomware-attack-in-new-front-of-cyberassault
[8] https://www.msspalert.com/cybersecurity-breaches-and-attacks/ransomware/city-power-johannesburg-south-africa/
[9] https://www.techspot.com/news/79119-jackson-county-government-gives-hackers-pays-400000.html
[10] https://blog.malwarebytes.com/ransomware/2019/05/ransomware-isnt-just-a-big-city-problem/
[11] R. Canzanese, S. Mancoridis, and M. Kam. System call-based detection of malicious processes. In 2015 IEEE International
Conference on Software Quality, Reliability and Security, pages 119–124. IEEE, 2015.
[12] S. Poudyal, K. P. Subedi, and D. Dasgupta. A framework for analyzingransomware using machine learning. In2018
IEEE Symposium Serieson Computational Intelligence (SSCI), pages 1692–1699. IEEE, 2018

Multi level ransomware analysis MALCON 2019 conference

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Multi level ransomware analysis MALCON 2019 conference

Similar to Multi level ransomware analysis MALCON 2019 conference (20)

More from Kishor Datta Gupta

More from Kishor Datta Gupta (20)

Recently uploaded

Recently uploaded (20)

Multi level ransomware analysis MALCON 2019 conference