SlideShare a Scribd company logo
1 of 29
Download to read offline
REDNAGA
ANDROID MALWARE
AND MACHINE LEARNING
CALEB FENTON
08.24.2017
Dead Drop SF
WHO AM I
• Researcher @ SentinelOne
• Previously @ Lookout and @SourceClear
• Enjoy reading, cryptocurrency, economics
• Made Simplify and other Android tools
• @caleb_fenton
• github.com/CalebFenton
CALEB
WHO ARE WE
• rednaga.io
• Banded together by the love of 0days and hot sauces
• Collaborate and try to improve the community
• Disclosures / Code / Lessons on GitHub
• @RedNagaSec
• github.com/RedNaga
RED NAGA
TALK OVERVIEW
1. Machine learning overview
2. Using apkfile for feature extraction
3. Useful features for Android malware
4. Tips for building good models
REDNAGA
MACHINE LEARNING
OVERVIEW
STEP 1: UNDERSTAND THE FORMAT
• Android apps come as APK files
• APKs are just ZIPs
• APKs are rich with variety
• Android manifest / binary XML
• Dalvik executables
• Signing certificates
• Other resources (icons, maps, sounds, …)
• Offensive & Defensive Android Reverse Engineering

github.com/rednaga/training/tree/master/DEFCON23
STEP 2: COLLECT SAMPLES
• Need lots of good and bad samples
• Diversity of good and bad is important
• Sample sources:
• VirusTotal, VirusShare, market crawlers, other
researchers, friends
STEP 3: ENGINEER FEATURES
How do humans do it?
STEP 3: ENGINEER FEATURES
features = [has_beard]
STEP 3: ENGINEER FEATURES
App label: MX Player Pro
Package: com.mxtech.videoplayer.pro
CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR
App label: Google Service Updater
Package: it.googleandroid.updater
CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US
Example 1
Example 2
STEP 3: ENGINEER FEATURES
• Certificate details - common name, country, …
• Suspicious strings - “pm uninstall”, “google”
• Permissions - which ones and how many
• API calls - send SMS, load DEX file
• Overall app quality - default icons, typos
STEP 4: BUILD AND TUNE MODELS
• Collect and prepare data
• Drop low value features
• Try many algorithms
• Train and blend multiple models
REVIEW
1. Collect samples
2. Understand the format
3. Engineer features (apkfile!)
4. Build and tune model
REDNAGA
USING APKFILE
WHAT IS APKFILE?
• APK feature extraction library (Java)
• github.com/CalebFenton/apkfile
• Parses DEX files (dexlib2)
• Parses APK certificates
• Parses Android manifest (based on ArscBlamer)
• Hardened for use against obfuscation
• Everything is an object for easy inspection
EXAMPLE: ANDROID MANIFEST
ApkFile apkFile = new ApkFile("someapp.apk");
AndroidManifest androidManifest = apkFile.getAndroidManifest();
// Get some manifest properties
String packageName = androidManifest.getPackageName();
String appLabel = androidManifest.getApplication().getLabel();
// Print permission names
for (Permission permission : androidManifest.getPermissions()) {
System.out.println("permission: " + permission.getName());
}
// Print exported services
for (Service service : androidManifest.getApplication().getServices()) {
if (service.isExported()) {
System.out.println("exported: " + service.getName());
}
}
EXAMPLE: APK CERTIFICATE
ApkFile apkFile = new ApkFile("example-malware.apk");
Certificate certificate = apkFile.getCertificate();
Collection<Certificate.SubjectAndIssuerRdns> allRdns =
certificate.getAllRdns();
// APK may be signed by multiple certificates
for (Certificate.SubjectAndIssuerRdns rdns : allRdns) {
Map<String, String> subjectRdns = rdns.getSubjectRdns();
// Get certificate subject CN and O properties
System.out.println("Subject common name: " + subjectRdns.get("CN"));
System.out.println("Subject organization: " + subjectRdns.get("O"));
// Print all certificate properties
System.out.println("Issuer RDNS: " + rdns.getIssuerRdns());
}
EXAMPLE: DALVIK EXECUTABLES
Map<String, DexFile> pathToDexFile = apkFile.getDexFiles();
for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) {
String path = e.getKey();
DexFile dexFile = e.getValue();
System.out.println("Analyzing " + path);
dexFile.analyze();
// Average cyclomatic complexity, also available for each method
System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity());
// Get API call counts over all methods
// Trove maps generally preferred for unboxing, incrementing performance
TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator();
while (iterator.hasNext()) {
iterator.advance();
MethodReference methodRef = iterator.key();
int count = iterator.value();
// E.g. Ljava/lang/StringBuilder;->toString called 18 times
System.out.println(methodRef + " called " + count + " times");
}
// Print op code histograms for each method
for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) {
String methodDescriptor = me.getKey();
// E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts
System.out.println(methodDescriptor + " op counts");
DexMethod dexMethod = me.getValue();
TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator();
while (opIter.hasNext()) {
opIter.advance();
// E.g. MOVE_RESULT_OBJECT: 46
System.out.println(" " + opIter.key() + ": " + opIter.value());
}
}
}
REDNAGA
USEFUL FEATURES
ANDROID MANIFEST
• Has main launcher activity
• No launcher implies no user interaction
• Number of activity package paths
• Malicious activities injected?
• Permissions / number of permissions
• Good clue what app may do
APKID FEATURES
• “PEiD for Android” - detects compilers, packers, …
• Compiler - dx (native) / dexlib (modified)
• Anti-VM strings - avoiding VM analysis
• Build.MANUFACTURER, SIM operator, device ID, subscriber ID
• Detecting Pirated and Malicious Android Apps with APKiD

rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
STRINGS
• Number of gibberish strings
• Find weird certificate details
• Find unusual obfuscation
•
Using Markov Chains for Android Malware Detection

calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
REDNAGA
TIPS FOR BUILDING
GOOD MODELS
TIPS
• Most guides are for toy data sets
• No one talks about large data set problems
• Everyone assumes you have a dense matrix
• Assuming sklearn, but applies to other libs
PREPARING DATA
• Normalization is important
• Scale with MaxAbs or MinMax if many 0s
• Needed for some algorithms (not decision trees)
• Needed for dropping invariant features
• Drop invariant features
• Reduces chance of overfitting
• Example: file hash, app label, rare API calls
SELECTING FEATURES
• Score features and plot scores to build intuition
• Usually long tail of useless features
• Gives ideas for new features
• Top 100 features almost as good as top 1000
• Run experiments with subsets of features
• Improves speed
• Only interested in relative differences
BUILDING MODELS
• Grid search to find best algorithms and parameters
• Iterate on several, smaller searches
• Decision tree ensembles aren’t hip, but work well

sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/
• Build and blend multiple models

sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/
• Feature Selection and Grid Searching Hyper-parameters

gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
REDNAGA
EXTENDED READING
https://github.com/rednaga/training/tree/master/DEFCON23
http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
https://rednaga.io/
https://calebfenton.github.io/
http://androidcracking.blogspot.com/
REDNAGA
08.24.2017
THANKS!
Dead Drop SF
CALEB FENTON
@CALEB_FENTON
QUESTIONS?

More Related Content

What's hot

ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019Rory Graves
 
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3mametter
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...Daryl Walleck
 
Dependence day insurgence
Dependence day insurgenceDependence day insurgence
Dependence day insurgenceJorge Ortiz
 
CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingSam Bowne
 
Fallacies of unit testing
Fallacies of unit testingFallacies of unit testing
Fallacies of unit testingOleksii Holub
 
Building Scalable Applications with Laravel
Building Scalable Applications with LaravelBuilding Scalable Applications with Laravel
Building Scalable Applications with LaravelMuhammad Shakeel
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakMarcus Denker
 
The Python in the Apple
The Python in the AppleThe Python in the Apple
The Python in the ApplezeroSteiner
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakMarcus Denker
 
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for JavaSystematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for JavaMichael Reif
 
Robot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs IntegrationRobot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs IntegrationSauce Labs
 
Android lint presentation
Android lint presentationAndroid lint presentation
Android lint presentationSinan KOZAK
 
Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4Darwin Biler
 
Dynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection PromisesDynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection PromisesMarcus Denker
 
Why the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID ArchitectureWhy the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID ArchitectureJorge Ortiz
 
Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use AdaCore
 
Practical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-DisassemblyPractical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-DisassemblySam Bowne
 

What's hot (20)

ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019
 
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
 
Dependence day insurgence
Dependence day insurgenceDependence day insurgence
Dependence day insurgence
 
CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
 
Fallacies of unit testing
Fallacies of unit testingFallacies of unit testing
Fallacies of unit testing
 
Variables in Pharo5
Variables in Pharo5Variables in Pharo5
Variables in Pharo5
 
Building Scalable Applications with Laravel
Building Scalable Applications with LaravelBuilding Scalable Applications with Laravel
Building Scalable Applications with Laravel
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond Smalltak
 
The Python in the Apple
The Python in the AppleThe Python in the Apple
The Python in the Apple
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond Smalltak
 
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for JavaSystematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
 
Robot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs IntegrationRobot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs Integration
 
Android lint presentation
Android lint presentationAndroid lint presentation
Android lint presentation
 
Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4
 
Dynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection PromisesDynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection Promises
 
Sonarjenkins ajip
Sonarjenkins ajipSonarjenkins ajip
Sonarjenkins ajip
 
Why the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID ArchitectureWhy the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID Architecture
 
Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use
 
Practical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-DisassemblyPractical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-Disassembly
 

Similar to Android Malware and Machine Learning

CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)Sam Bowne
 
Steelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android ApplicationsSteelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android ApplicationsTom Keetch
 
Android Penetration testing - Day 2
 Android Penetration testing - Day 2 Android Penetration testing - Day 2
Android Penetration testing - Day 2Mohammed Adam
 
OWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android REOWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android REOWASP Nagpur
 
Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)ClubHack
 
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)TestDevLab
 
Android application analyzer
Android application analyzerAndroid application analyzer
Android application analyzerSanjay Gondaliya
 
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxSANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxJasonOstrom1
 
Android village @nullcon 2012
Android village @nullcon 2012 Android village @nullcon 2012
Android village @nullcon 2012 hakersinfo
 
From MEAN to the MERN Stack
From MEAN to the MERN StackFrom MEAN to the MERN Stack
From MEAN to the MERN StackTroy Miles
 
Null Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat DasNull Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat Dasnullowaspmumbai
 
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013DuckMa
 
Android Scripting
Android ScriptingAndroid Scripting
Android ScriptingJuan Gomez
 
Attacking and Defending Mobile Applications
Attacking and Defending Mobile ApplicationsAttacking and Defending Mobile Applications
Attacking and Defending Mobile ApplicationsJerod Brennen
 
Introduction of Android Architecture
Introduction of Android ArchitectureIntroduction of Android Architecture
Introduction of Android ArchitectureBin Yang
 
Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9Alexey Dremin
 
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break editionMatteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break editionDuckMa
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case StudyAndy Hoernecke
 
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...Area41
 

Similar to Android Malware and Machine Learning (20)

CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)
 
Steelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android ApplicationsSteelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android Applications
 
Android Penetration testing - Day 2
 Android Penetration testing - Day 2 Android Penetration testing - Day 2
Android Penetration testing - Day 2
 
In app search 1
In app search 1In app search 1
In app search 1
 
OWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android REOWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android RE
 
Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)
 
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
 
Android application analyzer
Android application analyzerAndroid application analyzer
Android application analyzer
 
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxSANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
 
Android village @nullcon 2012
Android village @nullcon 2012 Android village @nullcon 2012
Android village @nullcon 2012
 
From MEAN to the MERN Stack
From MEAN to the MERN StackFrom MEAN to the MERN Stack
From MEAN to the MERN Stack
 
Null Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat DasNull Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat Das
 
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
 
Android Scripting
Android ScriptingAndroid Scripting
Android Scripting
 
Attacking and Defending Mobile Applications
Attacking and Defending Mobile ApplicationsAttacking and Defending Mobile Applications
Attacking and Defending Mobile Applications
 
Introduction of Android Architecture
Introduction of Android ArchitectureIntroduction of Android Architecture
Introduction of Android Architecture
 
Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9
 
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break editionMatteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case Study
 
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 

Android Malware and Machine Learning

  • 1. REDNAGA ANDROID MALWARE AND MACHINE LEARNING CALEB FENTON 08.24.2017 Dead Drop SF
  • 2. WHO AM I • Researcher @ SentinelOne • Previously @ Lookout and @SourceClear • Enjoy reading, cryptocurrency, economics • Made Simplify and other Android tools • @caleb_fenton • github.com/CalebFenton CALEB
  • 3. WHO ARE WE • rednaga.io • Banded together by the love of 0days and hot sauces • Collaborate and try to improve the community • Disclosures / Code / Lessons on GitHub • @RedNagaSec • github.com/RedNaga RED NAGA
  • 4. TALK OVERVIEW 1. Machine learning overview 2. Using apkfile for feature extraction 3. Useful features for Android malware 4. Tips for building good models
  • 6. STEP 1: UNDERSTAND THE FORMAT • Android apps come as APK files • APKs are just ZIPs • APKs are rich with variety • Android manifest / binary XML • Dalvik executables • Signing certificates • Other resources (icons, maps, sounds, …) • Offensive & Defensive Android Reverse Engineering
 github.com/rednaga/training/tree/master/DEFCON23
  • 7. STEP 2: COLLECT SAMPLES • Need lots of good and bad samples • Diversity of good and bad is important • Sample sources: • VirusTotal, VirusShare, market crawlers, other researchers, friends
  • 8. STEP 3: ENGINEER FEATURES How do humans do it?
  • 9. STEP 3: ENGINEER FEATURES features = [has_beard]
  • 10. STEP 3: ENGINEER FEATURES App label: MX Player Pro Package: com.mxtech.videoplayer.pro CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR App label: Google Service Updater Package: it.googleandroid.updater CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US Example 1 Example 2
  • 11. STEP 3: ENGINEER FEATURES • Certificate details - common name, country, … • Suspicious strings - “pm uninstall”, “google” • Permissions - which ones and how many • API calls - send SMS, load DEX file • Overall app quality - default icons, typos
  • 12. STEP 4: BUILD AND TUNE MODELS • Collect and prepare data • Drop low value features • Try many algorithms • Train and blend multiple models
  • 13. REVIEW 1. Collect samples 2. Understand the format 3. Engineer features (apkfile!) 4. Build and tune model
  • 15. WHAT IS APKFILE? • APK feature extraction library (Java) • github.com/CalebFenton/apkfile • Parses DEX files (dexlib2) • Parses APK certificates • Parses Android manifest (based on ArscBlamer) • Hardened for use against obfuscation • Everything is an object for easy inspection
  • 16. EXAMPLE: ANDROID MANIFEST ApkFile apkFile = new ApkFile("someapp.apk"); AndroidManifest androidManifest = apkFile.getAndroidManifest(); // Get some manifest properties String packageName = androidManifest.getPackageName(); String appLabel = androidManifest.getApplication().getLabel(); // Print permission names for (Permission permission : androidManifest.getPermissions()) { System.out.println("permission: " + permission.getName()); } // Print exported services for (Service service : androidManifest.getApplication().getServices()) { if (service.isExported()) { System.out.println("exported: " + service.getName()); } }
  • 17. EXAMPLE: APK CERTIFICATE ApkFile apkFile = new ApkFile("example-malware.apk"); Certificate certificate = apkFile.getCertificate(); Collection<Certificate.SubjectAndIssuerRdns> allRdns = certificate.getAllRdns(); // APK may be signed by multiple certificates for (Certificate.SubjectAndIssuerRdns rdns : allRdns) { Map<String, String> subjectRdns = rdns.getSubjectRdns(); // Get certificate subject CN and O properties System.out.println("Subject common name: " + subjectRdns.get("CN")); System.out.println("Subject organization: " + subjectRdns.get("O")); // Print all certificate properties System.out.println("Issuer RDNS: " + rdns.getIssuerRdns()); }
  • 18. EXAMPLE: DALVIK EXECUTABLES Map<String, DexFile> pathToDexFile = apkFile.getDexFiles(); for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) { String path = e.getKey(); DexFile dexFile = e.getValue(); System.out.println("Analyzing " + path); dexFile.analyze(); // Average cyclomatic complexity, also available for each method System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity()); // Get API call counts over all methods // Trove maps generally preferred for unboxing, incrementing performance TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator(); while (iterator.hasNext()) { iterator.advance(); MethodReference methodRef = iterator.key(); int count = iterator.value(); // E.g. Ljava/lang/StringBuilder;->toString called 18 times System.out.println(methodRef + " called " + count + " times"); } // Print op code histograms for each method for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) { String methodDescriptor = me.getKey(); // E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts System.out.println(methodDescriptor + " op counts"); DexMethod dexMethod = me.getValue(); TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator(); while (opIter.hasNext()) { opIter.advance(); // E.g. MOVE_RESULT_OBJECT: 46 System.out.println(" " + opIter.key() + ": " + opIter.value()); } } }
  • 20. ANDROID MANIFEST • Has main launcher activity • No launcher implies no user interaction • Number of activity package paths • Malicious activities injected? • Permissions / number of permissions • Good clue what app may do
  • 21. APKID FEATURES • “PEiD for Android” - detects compilers, packers, … • Compiler - dx (native) / dexlib (modified) • Anti-VM strings - avoiding VM analysis • Build.MANUFACTURER, SIM operator, device ID, subscriber ID • Detecting Pirated and Malicious Android Apps with APKiD
 rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
  • 22. STRINGS • Number of gibberish strings • Find weird certificate details • Find unusual obfuscation • Using Markov Chains for Android Malware Detection
 calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
  • 24. TIPS • Most guides are for toy data sets • No one talks about large data set problems • Everyone assumes you have a dense matrix • Assuming sklearn, but applies to other libs
  • 25. PREPARING DATA • Normalization is important • Scale with MaxAbs or MinMax if many 0s • Needed for some algorithms (not decision trees) • Needed for dropping invariant features • Drop invariant features • Reduces chance of overfitting • Example: file hash, app label, rare API calls
  • 26. SELECTING FEATURES • Score features and plot scores to build intuition • Usually long tail of useless features • Gives ideas for new features • Top 100 features almost as good as top 1000 • Run experiments with subsets of features • Improves speed • Only interested in relative differences
  • 27. BUILDING MODELS • Grid search to find best algorithms and parameters • Iterate on several, smaller searches • Decision tree ensembles aren’t hip, but work well
 sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/ • Build and blend multiple models
 sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/ • Feature Selection and Grid Searching Hyper-parameters
 gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
  • 29. REDNAGA 08.24.2017 THANKS! Dead Drop SF CALEB FENTON @CALEB_FENTON QUESTIONS?