1
2
Fast detection
of Android malware
Yury Leonychev
3
Introduction
4
Android application
APK
Manifest
(AndroidManifest.xml)
Code
(Classes.dex and
native)
Meta
information
(META-INF)
Resourc...
5
Brief list of tools for APK analysis
! Androguard (ultimate tool by @adesnos and others) – used
by VirusTotal, APKInspec...
6
Is this all? Really?
!  http://www.apk-analyzer.net
!  http://anubis.iseclab.org
!  http://apkscan.nviso.be
7
Our task is more complex
Malware
detector
8
Methods of malware detection
Static analysis
!  Advantages
–  APK has predictable content. Application behavior can be l...
9
Methods of malware detection
Dynamic analysis
!  Advantages
–  Clear results and interpretation
–  Open source solutions...
10
Methods of malware detection
Signature analysis
!  Advantages
–  Effective for known malware
–  Commercial solutions av...
11
Methods of malware detection
Seems like the most efficient way
is hybrid solution
12
MatrixNet
What is The Matrix?
13
Why can we use machine learning?
Abstract task description:
!  We have a set of objects (APK-files). We should divide th...
14
What is the MatrixNet?
MatrixNet is an implementation of gradient boosted decision trees algorithm
MatrixNet is a bit d...
15
Why MatrixNet is powerful?
!  This is machine learning algorithm for classification task
!  A key feature of this method...
16
MatrixNet post learning optimization
17
MatrixNet post learning optimization
Copyright © 2013 by Sidney Harris.
18
How it works?
Offline learning process:
!  Choosing features
!  Choosing samples
!  Manual classification (malware or not...
19
Features
What kind of features to use:
!  Permissions
!  URI in strings and other resources
!  Adware library usage
!  ...
20
Samples and classification
Malware applications:
! VirusTotal feed
!  Samples from malicious sites
Normal applications:
...
21
Formula
Features weight
Features cost
Learning
Normal
Malware
MatrixNetFeatures	
  
22
Measuring of mistakes
Formula 1
Features cost 1
Formula N
Features cost N
Normal
Malware
Formula with cool
confusion ma...
23
Analyzer architecture
Fine! I'll go build my own casino, with blackjack and
big data
24
Main parts
Parsers Analyzers
Oracle Report
25
Parsers
In depth
APK
ManifestParser ResourceParser MetaInfoParser ClassesParser
Analyzers
PermissionAnalyzer PackageAna...
26
ManifestParser
Avoid some obfuscation methods:
! HEUR:Backdoor.AndroidOS.Obad.a
27
<?xml version="1.0" encoding="utf-8"?>
<manifest ="singleTop" android:versionCode="2" ="2.0"
android:installLocation="i...
28
ClassesParser
!  Parser for DEX files
!  Internal DEX disassembler
!  Callgraph builder
!  Embeds “real” functions/varia...
29
ClassesParser
Disassembler
https://github.com/tracer0tong/de
Example:
./de.py test1.dex.dat
[[0, 'sget-object v0, {type...
30
ReflectionAnalyzer
java.lang.reflect.*
!  Classes: Field, Method, etc.
!  Functions: getClass(), getDeclaredField(), etc.
31
ReflectionAnalyzer
Output:
!  Report:
There is some reflections usage:
1@android.app.Activity->getContentResolver calls:
...
32
Service architecture
Nginx	
  
Gunicorn	
  
Flask	
  
Celery	
  
MongoDB	
  
Nginx	
  
Gunicorn	
  
Flask	
  
Celery	
 ...
33
Case study
34
Let's try it on...
Yandex.Store application feed:
!  More than 50K Android applications
!  More than 200 new/updated ap...
35
Perfomance. Check timing
~2 ms
~0,25 s
~4,5 min
36
Performance. Amount of checks
!  More than 16.000 applications checked in 1 hour on 1 cluster node
37
Confusion matrix
Meaning
Malware (Score > 0) Normal (Score < 0)
Fact
Malware 485 (97%) 15 (3%)
Normal 25 (5%) 475 (95%)
38
(Un)predictable results
!  Applications with malicious adware library AirPush classified as malware
!  But we have no sp...
39
Conclusion
It’s alive… alive!
40
It works!
!  Analytic methods work fine for detection Android mobile malware
!  Machine learning is not a “rocket scienc...
41
Thanks for attention
42
Yury Leonychev
Application Security
Engineer
yleonychev@yandex-team.ru
!   tracer0tong© Yandex LLC 2013
Upcoming SlideShare
Loading in …5
×

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс

2,823 views

Published on

В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
2,823
On SlideShare
0
From Embeds
0
Number of Embeds
1,204
Actions
Shares
0
Downloads
23
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс

  1. 1. 1
  2. 2. 2 Fast detection of Android malware Yury Leonychev
  3. 3. 3 Introduction
  4. 4. 4 Android application APK Manifest (AndroidManifest.xml) Code (Classes.dex and native) Meta information (META-INF) Resources (files and Resources.arsc)
  5. 5. 5 Brief list of tools for APK analysis ! Androguard (ultimate tool by @adesnos and others) – used by VirusTotal, APKInspector, etc. ! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster) ! TaintDroid (guys from Intel, Penn State University, Duke University) ! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan
  6. 6. 6 Is this all? Really? !  http://www.apk-analyzer.net !  http://anubis.iseclab.org !  http://apkscan.nviso.be
  7. 7. 7 Our task is more complex Malware detector
  8. 8. 8 Methods of malware detection Static analysis !  Advantages –  APK has predictable content. Application behavior can be learned by simply reading the file –  Checks are safe !  Limitations –  Can be ineffective for sophisticated malware and obfuscation techniques –  We cannot really tell as we don't execute app
  9. 9. 9 Methods of malware detection Dynamic analysis !  Advantages –  Clear results and interpretation –  Open source solutions available !  Limitations –  Not fast (enough) –  Can be detected and bypassed –  Big ecosystem requires big infrastructure
  10. 10. 10 Methods of malware detection Signature analysis !  Advantages –  Effective for known malware –  Commercial solutions available !  Limitations –  Signature databases requires regular (and frequent) updates –  Not effective for new malware –  Do you have a team of virus analytics?
  11. 11. 11 Methods of malware detection Seems like the most efficient way is hybrid solution
  12. 12. 12 MatrixNet What is The Matrix?
  13. 13. 13 Why can we use machine learning? Abstract task description: !  We have a set of objects (APK-files). We should divide this set into two subsets (malware and normal) !  For every element in main set we can count predictable amount of features !  Subsets – only result of simple classification task, so we can try to choose effective features
  14. 14. 14 What is the MatrixNet? MatrixNet is an implementation of gradient boosted decision trees algorithm MatrixNet is a bit different from standard: !  Using Oblivious Trees !  Accounting for sample count in each leaf
  15. 15. 15 Why MatrixNet is powerful? !  This is machine learning algorithm for classification task !  A key feature of this method is it’s resistance to overfitting
  16. 16. 16 MatrixNet post learning optimization
  17. 17. 17 MatrixNet post learning optimization Copyright © 2013 by Sidney Harris.
  18. 18. 18 How it works? Offline learning process: !  Choosing features !  Choosing samples !  Manual classification (malware or not) !  Learning on combined set of apps !  Calculating mistakes
  19. 19. 19 Features What kind of features to use: !  Permissions !  URI in strings and other resources !  Adware library usage !  Obfuscation methods !  …
  20. 20. 20 Samples and classification Malware applications: ! VirusTotal feed !  Samples from malicious sites Normal applications: !  Manual testing !  Trusted developers !  Yandex applications
  21. 21. 21 Formula Features weight Features cost Learning Normal Malware MatrixNetFeatures  
  22. 22. 22 Measuring of mistakes Formula 1 Features cost 1 Formula N Features cost N Normal Malware Formula with cool confusion matrix and low cost
  23. 23. 23 Analyzer architecture Fine! I'll go build my own casino, with blackjack and big data
  24. 24. 24 Main parts Parsers Analyzers Oracle Report
  25. 25. 25 Parsers In depth APK ManifestParser ResourceParser MetaInfoParser ClassesParser Analyzers PermissionAnalyzer PackageAnalyzer URLAnalyzer ReflectionAnalyzer Reports XHTMLReporter JSONReporter Oracle MatrixNet
  26. 26. 26 ManifestParser Avoid some obfuscation methods: ! HEUR:Backdoor.AndroidOS.Obad.a
  27. 27. 27 <?xml version="1.0" encoding="utf-8"?> <manifest ="singleTop" android:versionCode="2" ="2.0" android:installLocation="internalOnly" package="com.android.system.admin" xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission ="android.permission.READ_LOGS" /> <uses-permission ="android.permission.WAKE_LOCK" /> … <uses-permission ="android.permission.RECEIVE_SMS" /> <uses-permission ="android.permission.SEND_SMS" /> <uses-permission ="android.permission.CALL_PHONE" /> ManifestParser
  28. 28. 28 ClassesParser !  Parser for DEX files !  Internal DEX disassembler !  Callgraph builder !  Embeds “real” functions/variables names into disassembly listing !  Builds a list of used procedures and functions
  29. 29. 29 ClassesParser Disassembler https://github.com/tracer0tong/de Example: ./de.py test1.dex.dat [[0, 'sget-object v0, {type} [{class}].{field} // field@2225'], [2, 'invoke-virtual v0 @13970 // {class}->{method}'], [5, 'move-result-object v0'], [6, 'check-cast v0, [{type_name}] // type@0958'], [8, 'return-object v0']]
  30. 30. 30 ReflectionAnalyzer java.lang.reflect.* !  Classes: Field, Method, etc. !  Functions: getClass(), getDeclaredField(), etc.
  31. 31. 31 ReflectionAnalyzer Output: !  Report: There is some reflections usage: 1@android.app.Activity->getContentResolver calls: 598@java.lang.Class->forName 2@android.app.Activity->onActivityResult calls: 598@java.lang.Class->forName !  Amount of reflection calls is a feature.
  32. 32. 32 Service architecture Nginx   Gunicorn   Flask   Celery   MongoDB   Nginx   Gunicorn   Flask   Celery   MongoDB  
  33. 33. 33 Case study
  34. 34. 34 Let's try it on... Yandex.Store application feed: !  More than 50K Android applications !  More than 200 new/updated apps per week !  Open for developers (no strict manual verification)
  35. 35. 35 Perfomance. Check timing ~2 ms ~0,25 s ~4,5 min
  36. 36. 36 Performance. Amount of checks !  More than 16.000 applications checked in 1 hour on 1 cluster node
  37. 37. 37 Confusion matrix Meaning Malware (Score > 0) Normal (Score < 0) Fact Malware 485 (97%) 15 (3%) Normal 25 (5%) 475 (95%)
  38. 38. 38 (Un)predictable results !  Applications with malicious adware library AirPush classified as malware !  But we have no special features for adware in first version
  39. 39. 39 Conclusion It’s alive… alive!
  40. 40. 40 It works! !  Analytic methods work fine for detection Android mobile malware !  Machine learning is not a “rocket science” but cool and effective instrument !  Open API coming soon.
  41. 41. 41 Thanks for attention
  42. 42. 42 Yury Leonychev Application Security Engineer yleonychev@yandex-team.ru !   tracer0tong© Yandex LLC 2013

×