SlideShare a Scribd company logo
Sara Veterini
Sapienza, Università di Roma
Master in Engineering in Computer Science
Android Malware Classification with API
call-grams
15/01/2018Android Malware Classification with API
call-grams
Page 2
Does Android malware exist?
Yes. And it is also growing.
Symantec observed
18.4 million
detections in 2016,
double than 2015
15/01/2018Android Malware Classification with API
call-grams
Page 3
Why does Android malware exist?
• Smartphones contain sensitive data monetize!
• It is easy to develop it banking trojans are sold in
the black market for 200 $
• It is easy to distribute it third party stores
15/01/2018Android Malware Classification with API
call-grams
Page 4
What is Google doing?
• Google Bouncer, an antivirus system that scans both new
and existing apps on Google Play Store
• Google Play Protect, integration of Google Play application
that scans your device
But..
15/01/2018Android Malware Classification with API
call-grams
Page 5
What can we do?
• Detection: understanding if the sample is malware or not
• Classification: understanding which family the malware
belongs to
Both detection and classification can be performed with
machine learning techniques learn autonomously, with
the help of a knowledge base of many analyzed samples
15/01/2018Android Malware Classification with API
call-grams
Page 6
State of the art - Features
15/01/2018Android Malware Classification with API
call-grams
Page 7
State of the art - Features
15/01/2018Android Malware Classification with API
call-grams
Page 8
Distribution of works by feature used
15/01/2018Android Malware Classification with API
call-grams
Page 9
Distribution of works by feature used
15/01/2018Android Malware Classification with API
call-grams
Page 10
Thesis contributions
• Malware classification system with machine learning
algorithms
• Static features only: API call-grams (apigrams) extracted
from the call graph of the application, that are sequences of
API calls in order
15/01/2018Android Malware Classification with API
call-grams
Page 11
What is an apigram?
• Example:
15/01/2018Android Malware Classification with API
call-grams
Page 12
Proposed approach
• Extract call graph of the apk with Androguard library
• Create apigrams of length 3 from the graph
• Discard apigrams occurring in just one apk: they are not
representative!
• Build a matrix of size N * M:
– Each row of the matrix is a binary vector representing an apk a
– each element eag corresponds to a particular apigram ag
15/01/2018Android Malware Classification with API
call-grams
Page 13
Proposed approach
• Apigrams are extracted with two levels of abstraction
• 1st level, complete apigrams, containing:
– Activity name where the method is called
– Method name
– Method descriptor
• 2nd level, abstracted apigrams:
– Activity name
– Method name
15/01/2018Android Malware Classification with API
call-grams
Page 14
Proposed approach
• Before building the matrix, feature selection with Chi-
Square algorithm is applied
• Objective: reduce the number of features (they can reach the
magnitude of millions), and keep the best ones
• Classification is performed with Random Forest and Decision
Tree algorithms
15/01/2018Android Malware Classification with API
call-grams
Page 15
Datasets
• Tests made on two datasets.
• Drebin: it contains 5,560 malware samples in the period from
2010 to 2012. Many approaches have been tested on it.
• AndroZoo: updated dataset, it currently contains 5,704,998
samples. For each application, it gives its type and its family,
collected from VirusTotal reports with an automatic tool.
15/01/2018Android Malware Classification with API
call-grams
Page 16
Tests
• Drebin experiments:
– First: we classified taking the 9 biggest families, 100 samples for each,
obtaining a perfectly balanced set, both with normal and abstracted
apigrams.
– Second: we classified taking all samples of those 9 families, obtaining
an unbalanced set.
• AndroZoo experiments (both with normal and abstracted
apigrams) :
– First: we classified taking 9 families of type “trojan”, 100 samples each,
most of them are the same of the 9 families from Drebin.
– Second: we classified taking 9 families of different type, 100 samples
each, to see the differences in the two classifications.
15/01/2018Android Malware Classification with API
call-grams
Page 17
Results – Drebin
Features selected
Accuracy
15/01/2018Android Malware Classification with API
call-grams
Page 18
Results – AndroZoo – 9 trojan families
Features selected
Accuracy
15/01/2018Android Malware Classification with API
call-grams
Page 19
Results – AndroZoo – 9 different families
Features selected
Accuracy
15/01/2018Android Malware Classification with API
call-grams
Page 20
Conclusions and future work
• Accuracies of tests with trojans only fall in a range 82-87 %
• Accuracies with families of different types reach 94-95%
• A future work direction may be trying the same tests I did
with trojans on other types of malware
• Also, it can be interesting to find more levels of abstraction of
apigrams, to see which information must be kept in the string
to reach highest accuracy, in parallel with the previous tests I
suggested, on other types of malware

More Related Content

Similar to Android malware classification with API call-grams

ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONSANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
IJNSA Journal
 
Androinspector a system for
Androinspector a system forAndroinspector a system for
Androinspector a system for
IJNSA Journal
 
Fraud and Malware Detection in Google Play by using Search Rank
Fraud and Malware Detection in Google Play by using Search RankFraud and Malware Detection in Google Play by using Search Rank
Fraud and Malware Detection in Google Play by using Search Rank
ijtsrd
 
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
IJCSIS Research Publications
 
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROIDA FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
IJNSA Journal
 
Pindroid - Android Malware Detection Tool
Pindroid - Android Malware Detection Tool Pindroid - Android Malware Detection Tool
Pindroid - Android Malware Detection Tool
Akhil Goyal
 
IRJET- Android Malware Detection System
IRJET-  	  Android Malware Detection SystemIRJET-  	  Android Malware Detection System
IRJET- Android Malware Detection System
IRJET Journal
 
Humming bad research-report-final-62916
Humming bad research-report-final-62916Humming bad research-report-final-62916
Humming bad research-report-final-62916
Andrey Apuhtin
 
Permission based Android Malware Detection using Random Forest
Permission based Android Malware Detection using Random ForestPermission based Android Malware Detection using Random Forest
Permission based Android Malware Detection using Random Forest
IRJET Journal
 
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
Lili Wei
 
Eurecom уличили приложения для Android в тайной от пользователя активности
Eurecom уличили приложения для Android в тайной от пользователя активностиEurecom уличили приложения для Android в тайной от пользователя активности
Eurecom уличили приложения для Android в тайной от пользователя активности
Sergey Ulankin
 
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
IRJET - Research on Data Mining of Permission-Induced Risk for Android DevicesIRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
IRJET Journal
 
Slide Intervento Zanero Giornata del Perito 2015
Slide Intervento Zanero Giornata del Perito 2015Slide Intervento Zanero Giornata del Perito 2015
Slide Intervento Zanero Giornata del Perito 2015
LegolasTheElf
 
Survey on Fraud Malware Detection in Google Play Store
Survey on Fraud Malware Detection in Google Play Store         Survey on Fraud Malware Detection in Google Play Store
Survey on Fraud Malware Detection in Google Play Store
IRJET Journal
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
IJNSA Journal
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
IJNSA Journal
 
Reading Group Presentation: Why Eve and Mallory Love Android
Reading Group Presentation: Why Eve and Mallory Love AndroidReading Group Presentation: Why Eve and Mallory Love Android
Reading Group Presentation: Why Eve and Mallory Love Android
Michael Rushanan
 
Are free Android app security analysis tools effective in detecting known vul...
Are free Android app security analysis tools effective in detecting known vul...Are free Android app security analysis tools effective in detecting known vul...
Are free Android app security analysis tools effective in detecting known vul...
Venkatesh Prasad Ranganath
 
Android security
Android security Android security
Android security
Hassan Abutair
 
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
csandit
 

Similar to Android malware classification with API call-grams (20)

ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONSANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
ANDROINSPECTOR: A SYSTEM FOR COMPREHENSIVE ANALYSIS OF ANDROID APPLICATIONS
 
Androinspector a system for
Androinspector a system forAndroinspector a system for
Androinspector a system for
 
Fraud and Malware Detection in Google Play by using Search Rank
Fraud and Malware Detection in Google Play by using Search RankFraud and Malware Detection in Google Play by using Search Rank
Fraud and Malware Detection in Google Play by using Search Rank
 
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
 
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROIDA FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
A FRAMEWORK FOR THE DETECTION OF BANKING TROJANS IN ANDROID
 
Pindroid - Android Malware Detection Tool
Pindroid - Android Malware Detection Tool Pindroid - Android Malware Detection Tool
Pindroid - Android Malware Detection Tool
 
IRJET- Android Malware Detection System
IRJET-  	  Android Malware Detection SystemIRJET-  	  Android Malware Detection System
IRJET- Android Malware Detection System
 
Humming bad research-report-final-62916
Humming bad research-report-final-62916Humming bad research-report-final-62916
Humming bad research-report-final-62916
 
Permission based Android Malware Detection using Random Forest
Permission based Android Malware Detection using Random ForestPermission based Android Malware Detection using Random Forest
Permission based Android Malware Detection using Random Forest
 
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
ASE 2016 Taming Android Fragmentation: Characterizing and Detecting Compatibi...
 
Eurecom уличили приложения для Android в тайной от пользователя активности
Eurecom уличили приложения для Android в тайной от пользователя активностиEurecom уличили приложения для Android в тайной от пользователя активности
Eurecom уличили приложения для Android в тайной от пользователя активности
 
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
IRJET - Research on Data Mining of Permission-Induced Risk for Android DevicesIRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
 
Slide Intervento Zanero Giornata del Perito 2015
Slide Intervento Zanero Giornata del Perito 2015Slide Intervento Zanero Giornata del Perito 2015
Slide Intervento Zanero Giornata del Perito 2015
 
Survey on Fraud Malware Detection in Google Play Store
Survey on Fraud Malware Detection in Google Play Store         Survey on Fraud Malware Detection in Google Play Store
Survey on Fraud Malware Detection in Google Play Store
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
Reading Group Presentation: Why Eve and Mallory Love Android
Reading Group Presentation: Why Eve and Mallory Love AndroidReading Group Presentation: Why Eve and Mallory Love Android
Reading Group Presentation: Why Eve and Mallory Love Android
 
Are free Android app security analysis tools effective in detecting known vul...
Are free Android app security analysis tools effective in detecting known vul...Are free Android app security analysis tools effective in detecting known vul...
Are free Android app security analysis tools effective in detecting known vul...
 
Android security
Android security Android security
Android security
 
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

Android malware classification with API call-grams

  • 1. Sara Veterini Sapienza, Università di Roma Master in Engineering in Computer Science Android Malware Classification with API call-grams
  • 2. 15/01/2018Android Malware Classification with API call-grams Page 2 Does Android malware exist? Yes. And it is also growing. Symantec observed 18.4 million detections in 2016, double than 2015
  • 3. 15/01/2018Android Malware Classification with API call-grams Page 3 Why does Android malware exist? • Smartphones contain sensitive data monetize! • It is easy to develop it banking trojans are sold in the black market for 200 $ • It is easy to distribute it third party stores
  • 4. 15/01/2018Android Malware Classification with API call-grams Page 4 What is Google doing? • Google Bouncer, an antivirus system that scans both new and existing apps on Google Play Store • Google Play Protect, integration of Google Play application that scans your device But..
  • 5. 15/01/2018Android Malware Classification with API call-grams Page 5 What can we do? • Detection: understanding if the sample is malware or not • Classification: understanding which family the malware belongs to Both detection and classification can be performed with machine learning techniques learn autonomously, with the help of a knowledge base of many analyzed samples
  • 6. 15/01/2018Android Malware Classification with API call-grams Page 6 State of the art - Features
  • 7. 15/01/2018Android Malware Classification with API call-grams Page 7 State of the art - Features
  • 8. 15/01/2018Android Malware Classification with API call-grams Page 8 Distribution of works by feature used
  • 9. 15/01/2018Android Malware Classification with API call-grams Page 9 Distribution of works by feature used
  • 10. 15/01/2018Android Malware Classification with API call-grams Page 10 Thesis contributions • Malware classification system with machine learning algorithms • Static features only: API call-grams (apigrams) extracted from the call graph of the application, that are sequences of API calls in order
  • 11. 15/01/2018Android Malware Classification with API call-grams Page 11 What is an apigram? • Example:
  • 12. 15/01/2018Android Malware Classification with API call-grams Page 12 Proposed approach • Extract call graph of the apk with Androguard library • Create apigrams of length 3 from the graph • Discard apigrams occurring in just one apk: they are not representative! • Build a matrix of size N * M: – Each row of the matrix is a binary vector representing an apk a – each element eag corresponds to a particular apigram ag
  • 13. 15/01/2018Android Malware Classification with API call-grams Page 13 Proposed approach • Apigrams are extracted with two levels of abstraction • 1st level, complete apigrams, containing: – Activity name where the method is called – Method name – Method descriptor • 2nd level, abstracted apigrams: – Activity name – Method name
  • 14. 15/01/2018Android Malware Classification with API call-grams Page 14 Proposed approach • Before building the matrix, feature selection with Chi- Square algorithm is applied • Objective: reduce the number of features (they can reach the magnitude of millions), and keep the best ones • Classification is performed with Random Forest and Decision Tree algorithms
  • 15. 15/01/2018Android Malware Classification with API call-grams Page 15 Datasets • Tests made on two datasets. • Drebin: it contains 5,560 malware samples in the period from 2010 to 2012. Many approaches have been tested on it. • AndroZoo: updated dataset, it currently contains 5,704,998 samples. For each application, it gives its type and its family, collected from VirusTotal reports with an automatic tool.
  • 16. 15/01/2018Android Malware Classification with API call-grams Page 16 Tests • Drebin experiments: – First: we classified taking the 9 biggest families, 100 samples for each, obtaining a perfectly balanced set, both with normal and abstracted apigrams. – Second: we classified taking all samples of those 9 families, obtaining an unbalanced set. • AndroZoo experiments (both with normal and abstracted apigrams) : – First: we classified taking 9 families of type “trojan”, 100 samples each, most of them are the same of the 9 families from Drebin. – Second: we classified taking 9 families of different type, 100 samples each, to see the differences in the two classifications.
  • 17. 15/01/2018Android Malware Classification with API call-grams Page 17 Results – Drebin Features selected Accuracy
  • 18. 15/01/2018Android Malware Classification with API call-grams Page 18 Results – AndroZoo – 9 trojan families Features selected Accuracy
  • 19. 15/01/2018Android Malware Classification with API call-grams Page 19 Results – AndroZoo – 9 different families Features selected Accuracy
  • 20. 15/01/2018Android Malware Classification with API call-grams Page 20 Conclusions and future work • Accuracies of tests with trojans only fall in a range 82-87 % • Accuracies with families of different types reach 94-95% • A future work direction may be trying the same tests I did with trojans on other types of malware • Also, it can be interesting to find more levels of abstraction of apigrams, to see which information must be kept in the string to reach highest accuracy, in parallel with the previous tests I suggested, on other types of malware