Submitted To: Maam Tahira Mehboob
Presented By:
Anum Nisa
Sumaiya Arshad
MAY 18, 2016 | Machine Learning
ABOUT MALWARE & ITS
DETECTION TECHNIQUES:
INTODUCTION:
MAY 18, 2016 | Machine Learning
ABOUT MALWARE & ITS DETECTION
TECHNIQUES:
Malware is …
Malicious software
Virus, Spam, …
Increasing threats
*Continuous and increased attacks on infra-
structure
*Threats to business, national security & personal
security of PCs
Attacks are becoming more advanced and
sophisticated!
MAY 18, 2016 | Machine Learning
MALWARE Executables
Host vs Network based approaches
Limitation of existing techniques
-Signature-based approach
* Fails to detect zero-day attacks.
* Fails to detect threats with evolving capabilities
such as metamorphic and polymorphic malwa
re.
-Anomaly-based approach
*Producing high false positive rate.
-Supervised Learning based approach
*Poor performance on new and evolving malware
*Building classifier model is challenging due to
diversity of malware classes, imbalanced
distribution, data imperfection issues, etc.
MAY 18, 2016 | Machine Learning
Red Hocks (Viruses)
MAY 18, 2016 | Machine Learning
Our Goal
Machine Learning based approach
-Two level:
*Supervised learning approach to detect malicious
flows and further identify specific type
*Combine unsupervised learning with supervised
learning to address new class discovery problem
MAY 18, 2016 | Machine Learning
Two level malware detection framework:
Macro-level classifier
Used to isolate malicious flows from the non
-malicious ones.
Micro-level classifier
Further categorize the malicious flows into
one of the preexisting malware or new
malware
Proposed Framework
MAY 18, 2016 | Machine Learning
Proposed Framework Block diagram
MAY 18, 2016 | Machine Learning
Classification Process
Machine learning, data mining, and text classification &
detection methods to detect Malicious Executable
includes:
Classifies Unknown or Malicious using
 ML alogorithms
Random Forest Classifier
Boosted J 48 decision tree
KNN, naïvebayes, SVM, Multilayer
Perceptron MLP
Mal-ID Basic Detection Algorithm
Both the Bayes network and random forest
classifiers produced more accurate readings.
But boosted Decision Tree (J48) is best classifier
MAY 18, 2016 | Machine Learning
Experimental Evaluation
Our Analysis Shows that among three major foms of
viruses such as computer viruses, Internet worms
and Trojan horses the most dangerous is trojans
MAY 18, 2016 | Machine Learning
ANALYSIS
MAY 18, 2016 | Machine Learning
ANALYSIS
This section will introduce analysis techniques for mobile
and PCs malware. It will transfer well known techniques
from the common computer world to the platforms of
mobile devices.
The main idea of dynamic analysis is executing a given
sample in a controlled environment, monitoring its behavior,
and obtaining information about its nature and purpose.
This is especially important in the field of malware research
because a malware analyst must be able to assess a program’s
threat and create proper counter-measures.
While static analysis might provide more precise results, the
sheer mass of newly emerging malware each day makes it
impossible to conduct a static analysis for even a small
portion of today’s malware.
MAY 18, 2016 | Machine Learning
ANALYSIS Of PARAMETERS:
To analyze malware detection techniques s
ome evaluation parameters are used to detec
t quality
factors (NonFunctional Requirements) :
Category/Type of Virus
Detection Techniques
Algorithm/ Technology/ Mechanism
Best Classification methodology
Evaluation criterion
Implementation Tools
MAY 18, 2016 | Machine Learning
J48 is an extension of ID3.
The additional features of J48 are:
accounting for missing values,
decision trees pruning,
continuous attribute value ranges,
derivation of rules, etc.
In the WEKA data mining tool, J48 is an
open source Java implementation of the
C4.5 algorithm.
Boosted J 48 Decision Tree
MAY 18, 2016 | Machine Learning
Boosted J 48 Decision Tree
MAY 18, 2016 | Machine Learning
Conclusion:
We proposed an effective malware detection framework
based on data mining & machine learning techniques:
 Two level ML based classifier
 New class detection
 Encrypted data
A tree based kernel for SVM was proposed to handle the
data imperfection issue in network flow data
And Boosted J 48 decision tree classifier is analysized as
best classifier among no of different classifiers
MAY 18, 2016 | Machine Learning
Conclusion Contd:
However this paper shows the comparison of efficiency
rate of different malware detection techniques
including KNN, Naives Bayes, J 48 boosted, SVM
(Support Vector Machine).
We explain the feasibility of some detection methods a
nd highlight the major causes of increasing no of
malware files, but more research is necessary.
MAY 18, 2016 | Machine Learning
MAY 18, 2016 | Machine Learning
Future Works
Develop a hierarchical multi-class learning
method to enhance the testing efficiency when
the number of malware classes becomes
extremely large.
Detection (of malware) accuracy can be
improved, through further research into
classification algorithms and ways to mark
malware data more accurately.
And most of the classifiers used are not
optimized for hardware operations or
applications. Additionally hardware algorithm
design can increase precision or accuracy and
efficiency.
MAY 18, 2016 | Machine Learning
MAY 18, 2016 | Machine Learning
Extra
 Metamorphic malware is rewritten with each iteration so
that each succeeding version of thecode is different from
the preceding one. The code changes makes it difficult for
signature-based antivirus software programs to recognize
that different iterations are the same malicious program.
 Polymorphic malware also makes changes to code to avoid
detection. It has two parts, but one part remains the same
with each iteration, which makes the malware a little easier
to identify.
 an you imagine that a piece of malware code can change its
shape and signature each time it appears, to make it
extremely hard for signature based antivirus to detect them
?! This is called Polymorphic or Metamorphic malware.
 software. Trojans can be employed by cyber-thieves and
hackers trying to gain access to users' systems. Users are
typically tricked by some form of social engineering into
loading and executing Trojans on their systems. Once
activated, Trojans can enable cyber-criminals to spy on you,
steal your sensitive data, and gain backdoor access to your
system. These actions can include:
 Deleting data
 Blocking data
 Modifying data
 Copying data
 Disrupting the performance of computers or computer networks

Malware Detection Using Machine Learning Techniques

  • 1.
    Submitted To: MaamTahira Mehboob Presented By: Anum Nisa Sumaiya Arshad MAY 18, 2016 | Machine Learning
  • 2.
    ABOUT MALWARE &ITS DETECTION TECHNIQUES: INTODUCTION: MAY 18, 2016 | Machine Learning
  • 3.
    ABOUT MALWARE &ITS DETECTION TECHNIQUES: Malware is … Malicious software Virus, Spam, … Increasing threats *Continuous and increased attacks on infra- structure *Threats to business, national security & personal security of PCs Attacks are becoming more advanced and sophisticated! MAY 18, 2016 | Machine Learning
  • 4.
    MALWARE Executables Host vsNetwork based approaches Limitation of existing techniques -Signature-based approach * Fails to detect zero-day attacks. * Fails to detect threats with evolving capabilities such as metamorphic and polymorphic malwa re. -Anomaly-based approach *Producing high false positive rate. -Supervised Learning based approach *Poor performance on new and evolving malware *Building classifier model is challenging due to diversity of malware classes, imbalanced distribution, data imperfection issues, etc. MAY 18, 2016 | Machine Learning
  • 5.
    Red Hocks (Viruses) MAY18, 2016 | Machine Learning
  • 6.
    Our Goal Machine Learningbased approach -Two level: *Supervised learning approach to detect malicious flows and further identify specific type *Combine unsupervised learning with supervised learning to address new class discovery problem MAY 18, 2016 | Machine Learning
  • 7.
    Two level malwaredetection framework: Macro-level classifier Used to isolate malicious flows from the non -malicious ones. Micro-level classifier Further categorize the malicious flows into one of the preexisting malware or new malware Proposed Framework MAY 18, 2016 | Machine Learning
  • 8.
    Proposed Framework Blockdiagram MAY 18, 2016 | Machine Learning
  • 9.
    Classification Process Machine learning,data mining, and text classification & detection methods to detect Malicious Executable includes: Classifies Unknown or Malicious using  ML alogorithms Random Forest Classifier Boosted J 48 decision tree KNN, naïvebayes, SVM, Multilayer Perceptron MLP Mal-ID Basic Detection Algorithm Both the Bayes network and random forest classifiers produced more accurate readings. But boosted Decision Tree (J48) is best classifier MAY 18, 2016 | Machine Learning
  • 10.
    Experimental Evaluation Our AnalysisShows that among three major foms of viruses such as computer viruses, Internet worms and Trojan horses the most dangerous is trojans MAY 18, 2016 | Machine Learning
  • 11.
    ANALYSIS MAY 18, 2016| Machine Learning
  • 12.
    ANALYSIS This section willintroduce analysis techniques for mobile and PCs malware. It will transfer well known techniques from the common computer world to the platforms of mobile devices. The main idea of dynamic analysis is executing a given sample in a controlled environment, monitoring its behavior, and obtaining information about its nature and purpose. This is especially important in the field of malware research because a malware analyst must be able to assess a program’s threat and create proper counter-measures. While static analysis might provide more precise results, the sheer mass of newly emerging malware each day makes it impossible to conduct a static analysis for even a small portion of today’s malware. MAY 18, 2016 | Machine Learning
  • 13.
    ANALYSIS Of PARAMETERS: Toanalyze malware detection techniques s ome evaluation parameters are used to detec t quality factors (NonFunctional Requirements) : Category/Type of Virus Detection Techniques Algorithm/ Technology/ Mechanism Best Classification methodology Evaluation criterion Implementation Tools MAY 18, 2016 | Machine Learning
  • 15.
    J48 is anextension of ID3. The additional features of J48 are: accounting for missing values, decision trees pruning, continuous attribute value ranges, derivation of rules, etc. In the WEKA data mining tool, J48 is an open source Java implementation of the C4.5 algorithm. Boosted J 48 Decision Tree MAY 18, 2016 | Machine Learning
  • 16.
    Boosted J 48Decision Tree MAY 18, 2016 | Machine Learning
  • 17.
    Conclusion: We proposed aneffective malware detection framework based on data mining & machine learning techniques:  Two level ML based classifier  New class detection  Encrypted data A tree based kernel for SVM was proposed to handle the data imperfection issue in network flow data And Boosted J 48 decision tree classifier is analysized as best classifier among no of different classifiers MAY 18, 2016 | Machine Learning
  • 18.
    Conclusion Contd: However thispaper shows the comparison of efficiency rate of different malware detection techniques including KNN, Naives Bayes, J 48 boosted, SVM (Support Vector Machine). We explain the feasibility of some detection methods a nd highlight the major causes of increasing no of malware files, but more research is necessary. MAY 18, 2016 | Machine Learning
  • 19.
    MAY 18, 2016| Machine Learning
  • 20.
    Future Works Develop ahierarchical multi-class learning method to enhance the testing efficiency when the number of malware classes becomes extremely large. Detection (of malware) accuracy can be improved, through further research into classification algorithms and ways to mark malware data more accurately. And most of the classifiers used are not optimized for hardware operations or applications. Additionally hardware algorithm design can increase precision or accuracy and efficiency. MAY 18, 2016 | Machine Learning
  • 21.
    MAY 18, 2016| Machine Learning
  • 23.
    Extra  Metamorphic malwareis rewritten with each iteration so that each succeeding version of thecode is different from the preceding one. The code changes makes it difficult for signature-based antivirus software programs to recognize that different iterations are the same malicious program.  Polymorphic malware also makes changes to code to avoid detection. It has two parts, but one part remains the same with each iteration, which makes the malware a little easier to identify.  an you imagine that a piece of malware code can change its shape and signature each time it appears, to make it extremely hard for signature based antivirus to detect them ?! This is called Polymorphic or Metamorphic malware.
  • 24.
     software. Trojanscan be employed by cyber-thieves and hackers trying to gain access to users' systems. Users are typically tricked by some form of social engineering into loading and executing Trojans on their systems. Once activated, Trojans can enable cyber-criminals to spy on you, steal your sensitive data, and gain backdoor access to your system. These actions can include:  Deleting data  Blocking data  Modifying data  Copying data  Disrupting the performance of computers or computer networks