More Related Content Similar to This malware looks familiar: Layment identify malware runtime similarity with Chernoff faces and stick figures (20) This malware looks familiar: Layment identify malware runtime similarity with Chernoff faces and stick figures1. 1
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Software Engineering Institute
Carnegie Mellon University
Pittsburgh, PA 15213
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
This malware looks familiar:
Laymen Identify Malware
Run-time Similarity with
Chernoff faces and Stick
Figures
BICT ‘17 -- Elli Kanal
Joint work with N. VanHoudnos, W. Casey, D.
French, B. Lindauer, S. Moon, P. Jansen, J.
Carbonell, and E. Wright
2. 2
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Copyright 2016 Carnegie Mellon UniversityThis material is based upon work funded and supported by the
Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the
operation of the Software Engineering Institute, a federally funded research and development center.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the United States Department of Defense.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE
MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO
WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT
NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR
RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT
MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR
COPYRIGHT INFRINGEMENT.
[Distribution Statement A] This material has been approved for public release and unlimited distribution.
Please see Copyright notice for non-US Government use and distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or
electronic form without requesting formal permission. Permission is required for any other use. Requests for
permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu.
CERT® is a registered mark of Carnegie Mellon University.
DM-0004098
3. 3
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Motivation: Longitudinal Analysis of
Malware Families
Reverse
Engineering
Discovery
Refinement
Reflection
File
New
Family
Artifact
Catalog
Signature 1
Signature 2
Signature 3
…
Files 1a 1b 1c 1d …
Files 2a 2b 2c 2d …
Files 3a 3b 3c 3d …
…
Which files
hang
together?
4. 4
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Motivation: Would non-experts be fast
and cheap?
• Task: Given an exemplar, find
other malware artifacts of the
same class (family, behavior, etc.)
in our existing catalog.
• Problem: Diversity and volume of
incoming malware
• Human analysis is far too expensive
• Can’t run all tools on all samples
• Malware variation is unpredictable in
mode or frequency
• “I’ll know it when I see it” hard to
quantify
Artifacts
Year
Total Artifacts Over Time
5. 5
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Features for malware classification
• Features from static analysis
• Decompositional Techniques
• *Section hashes
• Resource hashes
• Interpretive Techniques
• Function hashes
• *Mnemonic class histograms
• *Import address table (IAT)
hashes
• Features from runtime analysis
• Host-based
• *System call traces / call graphs
• Filesystem operations
• Registry operations
• Network-based
* Explored in this project
6. 6
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
A proxy task
For validation, we need a learnable task and
ground truth.
To stay close to the real data, we projected
the samples into a four dimensional PCA
space, and mapped those dimensions onto
either stick figures or Chernoff faces.
7. 7
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Creature Classification on AMT
8. 8
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Creature Classification on AMT
9. 9
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
• The difficulty of a given stimulus is consistent across runs.
Results
Stick figures
are harder.
10. 10
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Expert labels vs. Turker labels
• “Ground Truth” shows an
SVM trained with expert
ground truth labels.
• ”Turkers Avg” trains the
classifier with layperson
labels instead.
11. 11
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Simple visualizations can allow
even completely untrained people
to differentiate malware families!
Results
12. 12
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
The Real World
• Discovery found 23 files
• Manual reverse engineering
was slow: only two files in 5
days
• Visualizing files with Chernoff
faces immediately suggests
groups of related files.
• Analysis burden shifts from
forming candidate groups to
verifying groups
• Faster and cheaper = happy
clients
13. 13
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
• Machine learning and human analysts provide complementary
capabilities in malware analysis.
• Visualization of runtime features is surprisingly powerful – so much so
that laypeople can label malware.
• Using more advanced dimensionality reduction, we can combine IAT
hashes and t-SNE over mnemonic counts to achieve an order-of-
magnitude reduction in analyst workload.
Conclusions
14. 14
This malware looks familiar: Laymen Identify Malware
Run-time Similarity with Chernoff faces and Stick Figures
© 2016 Carnegie Mellon University
[DISTRIBUTION STATEMENT A] This material has been approved
for public release and unlimited distribution.
BICT 2017
Contact Information
Eliezer Kanal
Technical Manager & Principle Researcher
Telephone: +1 412.268.5204
Email: ekanal@sei.cmu.edu
Editor's Notes Reflection is a judgment call
Goal: NOT remove engineers, but ASSIST engineers with discovery, refinement, and reflection Discuss costs of engineers mention that static is cheaper
Note that none of this is interpretable to layperson, need proxy measure for this stuff Want to make sure that faces are easier than other stuff (FFA, hypothesis); examined whether repeated trials had similar accuracy Point out how faces are easier, which is expected based on knowledge of FFA labels are accurate Comment to reviewers: only three cases here because there were only a few malware families being chosen between for this (real-life) sample of malware