SlideShare a Scribd company logo
1 of 31
Download to read offline
June 13, 2018
MASSE ANALYSIS MODULES
Experimental Results (6 mos)
Alexander Zhdanov
MASSE
TAMIS
Inria Rennes-
Bretagne Atlantique
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 2
introduction
The purpose of the presentation is to summarize experimental
results for the first 6 months of research. It gives explanations
about algorithms used, experimental setup and datasets
together with analysis of the resulting output.
Outline
Problem formulation
MASSE-Overview
Yara rules
two algorithms
n-gram based Markov model difference (baseline)
Genetic Algorithm (GA)
experiments
setup
conclusions and discussion
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 3
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 4
1Problem formulation
MASSE - Yara rules
Problem formulation MASSE-Overview
MASSE-Overview
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 5
Problem formulation Yara rules
Yara rules
YARA library and scanner is a defacto standard in malware
signature scanning for files
The YARA signature rule format is an easy-to-understand
DSL with a C-like syntax
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 6
Problem formulation Yara rules
Yara rules
1 rule silent_banker : banker
{
3 meta:
description = "This is just an
example"
5 thread_level = 3
in_the_wild = true
7 strings:
$a = {6A 40 68 00 30 00 00 6A 14 8D
91}
9 $b = {8D 4D B0 2B C1 83 C0 27 99 6A
4E 59 F7 F9}
$c = " UVODFRYSIHLNWPEJXQZAKCBGMT "
11 condition:
$a or $b or $c
13 }
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 7
Problem formulation Yara rules
Yara rules
The yara rules contain the following sections:
metadata: additional information about the rule
strings: hexadecimal strings, text and regular expressions
conditions: boolean expressions (with variables)
the pattern matching swiss army knife
Usage: yara [OPTION]... RULES FILE FILE | DIR | PID
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 8
Problem formulation Yara rules
Yara rules
pros:
easy-to read and understand
fast classification (string (pattern) matching)
fast sharing and update of yara-database (virus-total)
cons:
Static signatures are not prone to malware mutation, packing,
obfuscation
Yara-rules are written manually (performance, optimality)
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 9
Problem formulation Yara rules
Yara rules
Q: why YARA is so popular?
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 10
Problem formulation Yara rules
Yara rules
A: regular expression syntax (wildcards)
E2 34 ?? C8 A? FB
F4 23 [4-6] 62 B4
F4 23 ( 62 B4 — 56 ) 45
FE 39 45 [10-] 89 00
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 11
Problem formulation Yara rules
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 12
2two algorithms
n-gram based Markov model
difference (baseline) & GA
two algorithms n-gram based Markov model difference (baseline)
n-gram based Markov model difference (baseline)
difference of n-gram based Markov models
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 13
two algorithms n-gram based Markov model difference (baseline)
n-gram based Markov model difference (baseline)
calculate n-gram Markov model for cleanware
calculate n-gram Markov model for malware
subtract two models
(optional): subtract models for other malware families (diff)
filter n-grams using two-step filtration:
sort
calculate entropy
select top (number of bytes)
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 14
two algorithms Genetic Algorithm (GA)
Genetic Algorithm (GA) steps
calculate n-gram Markov model of malware
apply two step filtration
sort
calculate entropy
generate a new population
calculate f1 scores for the new population
while condition for the termination is not reached:
apply mutation
apply crossover
replace min elements in the population with children
select best individuals
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 15
two algorithms Genetic Algorithm (GA)
Genetic Algorithm steps
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 16
two algorithms Genetic Algorithm (GA)
f1 score (binary classification)
precision = tp
tp+fp
recall = tp
tp+fn
F1 = 2 ∗ precision∗recall
precision+recall
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 17
two algorithms Genetic Algorithm (GA)
f1 score (binary classification with rejection)
precision = tp
tp+fp
recall = tp
tp+fn
F1 = 2 ∗ precision∗recall
precision+recall
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 18
two algorithms Genetic Algorithm (GA)
Stopping criteria and a minimization function of the
Genetic Algorithm
((num unique − self .config.num unique el) + (eval score −
self .config.max score)) ∗ ((num cycles −
self .config.max num cycles) + (prev score − eval score) −
self .config.prec))
where
num unique is the number of unique elements in the
generation
eval score is the average f1 score calculated for the current
generation
num cycles is the number of cycles for the current generation
self .config.prec is the lower bound on changing of the
minimization function
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 19
two algorithms Genetic Algorithm (GA)
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 20
3experimental results
5 malware families
experiments setup
datasets
cleanware
10 elf files both packed and unpacked
malware
5 malware families
blihan
rebhip
viking
vmprotect (packer)
zvuzona
in total 217 binaries
blihan and zvuzona are unpacked
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 21
experiments setup
algorithmic parameters
n-gram based Markov model difference (baseline)
number of bytes in n-gram: 5
Genetic Algorithm (GA)
number of individuals in a generation: 1000
number of selected individuals: 100
gaussian distribution of chromosomes with params:
mu = 4
sigma = 1
number of bits in the mutation step: 2
max score to evaluate: 1.0
number of unique elements expected: 2
max number of cycles: 10
masse analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 22
experiments setup
Prior distribution of individuals by the number of strings
MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 23
experiments setup
f1 scores binary classification
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 24
experiments setup
f1 scores binary classification with rejection
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 25
experiments setup
length of yara rules (number of strings)
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 26
experiments setup
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 27
4conclusions
and discussion
conclusions and discussion
conclusions and discussion
implemented construction of syntactic malware/cleanware
Markov models based on n-grams
implemented three algorithms for yara rules generation:
n-gram based Markov model difference (baseline)
Genetic Algorithm (GA)
n-gram based Markov model difference with multiple models
(diff)
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 28
conclusions and discussion
conclusions and discussion
both baseline and GA work on unpacked malware (packed:
packer signatures)
parameters of GA are chosen so that the algorithm does a fast
evaluation
for GA detection rates depend on the number of generated
individuals in a population (higher coverage)
for binary classification:
GA has the same detection rate as baseline for blihan and
zvuzona
for vmprotect, GA produces higher detection rate
for rebhip and viking, GA has slightly less detection rates
(0.98/0.93 and 0.90/0.86)
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 29
conclusions and discussion
conclusions and discussion
for binary classification with rejection:
GA produces signatures with better detection rates than
baseline (significantly better: rebhip, viking )
multiple models heuristic (diff) does not produce signatures on
packed/obfuscated malware
for zvuzona, multiple models heuristic (diff) produces the same
detection as Genetic Algorithm
length of the produced yara rules:
on average, Genetic Algorithm produces shorter yara rules than
baseline: 4/39,2
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 30
conclusions and discussion
future work
extend cleanware dataset
run tests on more malware families (need for good packing
detector/extractor)
improve Genetic Algorithm:
run more experiments with higher parameter values:
more cycles, more individuals, higher mutation rates, ...
use more machine learning techniques
Hidden Markov Models (HMM)
THANKS
MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 31

More Related Content

Similar to Presentationeng

Music Genre Classification using Machine Learning
Music Genre Classification using Machine LearningMusic Genre Classification using Machine Learning
Music Genre Classification using Machine LearningIRJET Journal
 
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfGenome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfRezaDystaSatria
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...IJECEIAES
 
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESTOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESijaia
 
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial ApproachPREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approachahmet furkan emrehan
 
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf OptimizationNew Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf OptimizationAboul Ella Hassanien
 
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling gerogepatton
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GAHocine Merabti
 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...ijaia
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...EuroIoTa
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization Francis Palma
 
Iaetsd an efficient way of detecting a numbers in car
Iaetsd an efficient way of detecting a numbers in carIaetsd an efficient way of detecting a numbers in car
Iaetsd an efficient way of detecting a numbers in carIaetsd Iaetsd
 

Similar to Presentationeng (20)

my IEEE
my IEEEmy IEEE
my IEEE
 
My
MyMy
My
 
Music Genre Classification using Machine Learning
Music Genre Classification using Machine LearningMusic Genre Classification using Machine Learning
Music Genre Classification using Machine Learning
 
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfGenome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
June 2016 - Zuogong
June 2016 - ZuogongJune 2016 - Zuogong
June 2016 - Zuogong
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
 
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESTOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
 
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial ApproachPREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
 
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf OptimizationNew Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
 
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GA
 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
 
Resume Abhishek Roushan
Resume Abhishek RoushanResume Abhishek Roushan
Resume Abhishek Roushan
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization
 
Iaetsd an efficient way of detecting a numbers in car
Iaetsd an efficient way of detecting a numbers in carIaetsd an efficient way of detecting a numbers in car
Iaetsd an efficient way of detecting a numbers in car
 

Recently uploaded

Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Stationsiddharthteach18
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniR. Sosa
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological universityMohd Saifudeen
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 

Recently uploaded (20)

Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 

Presentationeng

  • 1. June 13, 2018 MASSE ANALYSIS MODULES Experimental Results (6 mos) Alexander Zhdanov MASSE TAMIS Inria Rennes- Bretagne Atlantique
  • 2. MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 2 introduction The purpose of the presentation is to summarize experimental results for the first 6 months of research. It gives explanations about algorithms used, experimental setup and datasets together with analysis of the resulting output.
  • 3. Outline Problem formulation MASSE-Overview Yara rules two algorithms n-gram based Markov model difference (baseline) Genetic Algorithm (GA) experiments setup conclusions and discussion MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 3
  • 4. MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 4 1Problem formulation MASSE - Yara rules
  • 5. Problem formulation MASSE-Overview MASSE-Overview MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 5
  • 6. Problem formulation Yara rules Yara rules YARA library and scanner is a defacto standard in malware signature scanning for files The YARA signature rule format is an easy-to-understand DSL with a C-like syntax MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 6
  • 7. Problem formulation Yara rules Yara rules 1 rule silent_banker : banker { 3 meta: description = "This is just an example" 5 thread_level = 3 in_the_wild = true 7 strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} 9 $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9} $c = " UVODFRYSIHLNWPEJXQZAKCBGMT " 11 condition: $a or $b or $c 13 } MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 7
  • 8. Problem formulation Yara rules Yara rules The yara rules contain the following sections: metadata: additional information about the rule strings: hexadecimal strings, text and regular expressions conditions: boolean expressions (with variables) the pattern matching swiss army knife Usage: yara [OPTION]... RULES FILE FILE | DIR | PID MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 8
  • 9. Problem formulation Yara rules Yara rules pros: easy-to read and understand fast classification (string (pattern) matching) fast sharing and update of yara-database (virus-total) cons: Static signatures are not prone to malware mutation, packing, obfuscation Yara-rules are written manually (performance, optimality) MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 9
  • 10. Problem formulation Yara rules Yara rules Q: why YARA is so popular? MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 10
  • 11. Problem formulation Yara rules Yara rules A: regular expression syntax (wildcards) E2 34 ?? C8 A? FB F4 23 [4-6] 62 B4 F4 23 ( 62 B4 — 56 ) 45 FE 39 45 [10-] 89 00 MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 11
  • 12. Problem formulation Yara rules MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 12 2two algorithms n-gram based Markov model difference (baseline) & GA
  • 13. two algorithms n-gram based Markov model difference (baseline) n-gram based Markov model difference (baseline) difference of n-gram based Markov models MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 13
  • 14. two algorithms n-gram based Markov model difference (baseline) n-gram based Markov model difference (baseline) calculate n-gram Markov model for cleanware calculate n-gram Markov model for malware subtract two models (optional): subtract models for other malware families (diff) filter n-grams using two-step filtration: sort calculate entropy select top (number of bytes) MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 14
  • 15. two algorithms Genetic Algorithm (GA) Genetic Algorithm (GA) steps calculate n-gram Markov model of malware apply two step filtration sort calculate entropy generate a new population calculate f1 scores for the new population while condition for the termination is not reached: apply mutation apply crossover replace min elements in the population with children select best individuals MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 15
  • 16. two algorithms Genetic Algorithm (GA) Genetic Algorithm steps MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 16
  • 17. two algorithms Genetic Algorithm (GA) f1 score (binary classification) precision = tp tp+fp recall = tp tp+fn F1 = 2 ∗ precision∗recall precision+recall MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 17
  • 18. two algorithms Genetic Algorithm (GA) f1 score (binary classification with rejection) precision = tp tp+fp recall = tp tp+fn F1 = 2 ∗ precision∗recall precision+recall MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 18
  • 19. two algorithms Genetic Algorithm (GA) Stopping criteria and a minimization function of the Genetic Algorithm ((num unique − self .config.num unique el) + (eval score − self .config.max score)) ∗ ((num cycles − self .config.max num cycles) + (prev score − eval score) − self .config.prec)) where num unique is the number of unique elements in the generation eval score is the average f1 score calculated for the current generation num cycles is the number of cycles for the current generation self .config.prec is the lower bound on changing of the minimization function MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 19
  • 20. two algorithms Genetic Algorithm (GA) MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 20 3experimental results 5 malware families
  • 21. experiments setup datasets cleanware 10 elf files both packed and unpacked malware 5 malware families blihan rebhip viking vmprotect (packer) zvuzona in total 217 binaries blihan and zvuzona are unpacked MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 21
  • 22. experiments setup algorithmic parameters n-gram based Markov model difference (baseline) number of bytes in n-gram: 5 Genetic Algorithm (GA) number of individuals in a generation: 1000 number of selected individuals: 100 gaussian distribution of chromosomes with params: mu = 4 sigma = 1 number of bits in the mutation step: 2 max score to evaluate: 1.0 number of unique elements expected: 2 max number of cycles: 10 masse analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 22
  • 23. experiments setup Prior distribution of individuals by the number of strings MASSE analysis modules: experimental results (6 mos) Alexander Zhdanov June 13, 2018- 23
  • 24. experiments setup f1 scores binary classification MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 24
  • 25. experiments setup f1 scores binary classification with rejection MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 25
  • 26. experiments setup length of yara rules (number of strings) MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 26
  • 27. experiments setup MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 27 4conclusions and discussion
  • 28. conclusions and discussion conclusions and discussion implemented construction of syntactic malware/cleanware Markov models based on n-grams implemented three algorithms for yara rules generation: n-gram based Markov model difference (baseline) Genetic Algorithm (GA) n-gram based Markov model difference with multiple models (diff) MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 28
  • 29. conclusions and discussion conclusions and discussion both baseline and GA work on unpacked malware (packed: packer signatures) parameters of GA are chosen so that the algorithm does a fast evaluation for GA detection rates depend on the number of generated individuals in a population (higher coverage) for binary classification: GA has the same detection rate as baseline for blihan and zvuzona for vmprotect, GA produces higher detection rate for rebhip and viking, GA has slightly less detection rates (0.98/0.93 and 0.90/0.86) MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 29
  • 30. conclusions and discussion conclusions and discussion for binary classification with rejection: GA produces signatures with better detection rates than baseline (significantly better: rebhip, viking ) multiple models heuristic (diff) does not produce signatures on packed/obfuscated malware for zvuzona, multiple models heuristic (diff) produces the same detection as Genetic Algorithm length of the produced yara rules: on average, Genetic Algorithm produces shorter yara rules than baseline: 4/39,2 MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 30
  • 31. conclusions and discussion future work extend cleanware dataset run tests on more malware families (need for good packing detector/extractor) improve Genetic Algorithm: run more experiments with higher parameter values: more cycles, more individuals, higher mutation rates, ... use more machine learning techniques Hidden Markov Models (HMM) THANKS MASSE analysis modules: experimental results (6 mos) alexander zhdanov June 13, 2018- 31