App behavioral analysis using system calls

Prajit Kumar Das
Prajit Kumar DasPhD Candidate and Research Assistant at Ebiquity Research Group
App behavioral
analysis using system
calls
Prajit Kumar Das
Advisor: Anupam Joshi,
PhD
Co-Advisor: Tim Finin,
PhD
Problem statement
We present Heimdall, a framework
for studying system calls made by
mobile apps in order to determine an
app’s behavior class and match such
behavior to their stated purpose.
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 2
UMBC Ebiquity Research Group | MobiSec 2017
Image courtesy: Android App Market
Motivation: App issues
Tuesday, May 2, 2017 3
Motivation: Permission inadequacy
Pre-Marshmallow Marshmallow
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 4
Do you read privacy policies and do you
understand them?
Source: lifehacker
Motivation: Software Limitation
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 5
Related Work
• System calls used for software analysis Kosoresow’97
• Three areas of research for mobile app analysis
• Malware classification Zhou’12
• NLP techniques Pandtia’13, Gorla’14
• Taint tracking Enck’10
• Google PHA taxonomy Google Android Security’16
• PrivacyGrade: Grading The Privacy Of Smartphone Apps
• Expectation and purpose: understanding users’ mental models
of mobile app privacy through crowdsourcing. Lin’12
• Modeling users’ mobile app privacy preferences, Lin’14
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 6
Architecture
Download
module
Annotations
module
System call capture module
Mobile device
emulator
Feature generation module
Classification module
Classifier outputUMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 7
Dataset distribution
10 annotated categories, 20 Google Play categories – 75% tool and
productivity
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 8
Experimental setup
• 1560 apps
• 534 successfully executed
• strace can only be used on emulator
• UI/Application exerciser tool used
• Android 6.0.1 December 2015 build used
• SVM, MLP, J48 and NB used
• Best F1-score from MLP at 0.44
• 1-hot vectors – Call present or absent
• TF-IDF weight vectors – Uniqueness and
significance of system calls for app
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 9
Similar TF-IDF scores - Scientific calculator vs To-do list
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 10
Results: Google categories
• Google’s categories are
developer provided
• Could be misleading
• System calls could classify
behavior better
• Additional features
required
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 11
Results: Annotated classes
• System calls not totally
useless
• 1-hot vs TF-IDF
Classifier TF-IDF 1-hot
MLP 0.44 0.26
SVM-Poly 0.31 0.23
SVM-RBF 0.32 0.21
J48 0.27 0.31
NB 0.27 0.27
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 12
Challenges faced
• Annotating app behavior class
• Emulator instabilities
• App limitations/bugs
• User credentials required
• Multi-behavior apps
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 13
Conclusions
Can system calls be used to
distinguish between how an app
“behaves” and it’s perceived/stated
purpose?
• System calls – insufficient as
features
• Emulator – better ones required
• Coarser behavior classes
required
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 14
Future work
• System call bi-grams, tri-grams – for capturing call sequences
• Better emulator – Genymotion
• Malware classification – less scope (state-of-the-art is 96.7%)
• Generating contextual policies from behavior estimation
• Features planned for future:
• App descriptions
• App ratings
• PHA app analysis – require samples for Mobile Unwanted Software
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 15
Questions?
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 16
1 of 16

Recommended

Capturing policies for fine-grained access control on mobile devices by
Capturing policies for fine-grained access control on mobile devicesCapturing policies for fine-grained access control on mobile devices
Capturing policies for fine-grained access control on mobile devicesPrajit Kumar Das
534 views21 slides
Semantic Knowledge and Privacy in the Physical Web by
Semantic Knowledge and Privacy in  the Physical WebSemantic Knowledge and Privacy in  the Physical Web
Semantic Knowledge and Privacy in the Physical WebPrajit Kumar Das
563 views44 slides
ChatGPT and the Future of Work - Clark Boyd by
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
20.3K views69 slides
Getting into the tech field. what next by
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
5K views22 slides
Google's Just Not That Into You: Understanding Core Updates & Search Intent by
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
5.8K views99 slides
How to have difficult conversations by
How to have difficult conversations How to have difficult conversations
How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC
4.3K views19 slides

More Related Content

Featured

The six step guide to practical project management by
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
36.6K views27 slides
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright... by
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
12.6K views21 slides
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present... by
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
55.4K views138 slides
12 Ways to Increase Your Influence at Work by
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
401.6K views64 slides
ChatGPT webinar slides by
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slidesAlireza Esmikhani
30.3K views36 slides

Featured(20)

The six step guide to practical project management by MindGenius
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius36.6K views
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright... by RachelPearson36
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson3612.6K views
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present... by Applitools
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools55.4K views
12 Ways to Increase Your Influence at Work by GetSmarter
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter401.6K views
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G... by DevGAMM Conference
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference3.6K views
Barbie - Brand Strategy Presentation by Erica Santiago
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago25.1K views
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well by Saba Software
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software25.2K views
Introduction to C Programming Language by Simplilearn
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn8.4K views
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr... by Palo Alto Software
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
Palo Alto Software88.3K views
9 Tips for a Work-free Vacation by Weekdone.com
9 Tips for a Work-free Vacation9 Tips for a Work-free Vacation
9 Tips for a Work-free Vacation
Weekdone.com7.2K views
How to Map Your Future by SlideShop.com
How to Map Your FutureHow to Map Your Future
How to Map Your Future
SlideShop.com275.1K views
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -... by AccuraCast
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...
AccuraCast3.4K views
Exploring ChatGPT for Effective Teaching and Learning.pptx by Stan Skrabut, Ed.D.
Exploring ChatGPT for Effective Teaching and Learning.pptxExploring ChatGPT for Effective Teaching and Learning.pptx
Exploring ChatGPT for Effective Teaching and Learning.pptx
Stan Skrabut, Ed.D.57.6K views
How to train your robot (with Deep Reinforcement Learning) by Lucas García, PhD
How to train your robot (with Deep Reinforcement Learning)How to train your robot (with Deep Reinforcement Learning)
How to train your robot (with Deep Reinforcement Learning)
Lucas García, PhD42.5K views
4 Strategies to Renew Your Career Passion by Daniel Goleman
4 Strategies to Renew Your Career Passion4 Strategies to Renew Your Career Passion
4 Strategies to Renew Your Career Passion
Daniel Goleman122K views
The Student's Guide to LinkedIn by LinkedIn
The Student's Guide to LinkedInThe Student's Guide to LinkedIn
The Student's Guide to LinkedIn
LinkedIn87.9K views

App behavioral analysis using system calls

  • 1. App behavioral analysis using system calls Prajit Kumar Das Advisor: Anupam Joshi, PhD Co-Advisor: Tim Finin, PhD
  • 2. Problem statement We present Heimdall, a framework for studying system calls made by mobile apps in order to determine an app’s behavior class and match such behavior to their stated purpose. UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 2
  • 3. UMBC Ebiquity Research Group | MobiSec 2017 Image courtesy: Android App Market Motivation: App issues Tuesday, May 2, 2017 3
  • 4. Motivation: Permission inadequacy Pre-Marshmallow Marshmallow UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 4
  • 5. Do you read privacy policies and do you understand them? Source: lifehacker Motivation: Software Limitation UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 5
  • 6. Related Work • System calls used for software analysis Kosoresow’97 • Three areas of research for mobile app analysis • Malware classification Zhou’12 • NLP techniques Pandtia’13, Gorla’14 • Taint tracking Enck’10 • Google PHA taxonomy Google Android Security’16 • PrivacyGrade: Grading The Privacy Of Smartphone Apps • Expectation and purpose: understanding users’ mental models of mobile app privacy through crowdsourcing. Lin’12 • Modeling users’ mobile app privacy preferences, Lin’14 UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 6
  • 7. Architecture Download module Annotations module System call capture module Mobile device emulator Feature generation module Classification module Classifier outputUMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 7
  • 8. Dataset distribution 10 annotated categories, 20 Google Play categories – 75% tool and productivity UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 8
  • 9. Experimental setup • 1560 apps • 534 successfully executed • strace can only be used on emulator • UI/Application exerciser tool used • Android 6.0.1 December 2015 build used • SVM, MLP, J48 and NB used • Best F1-score from MLP at 0.44 • 1-hot vectors – Call present or absent • TF-IDF weight vectors – Uniqueness and significance of system calls for app UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 9
  • 10. Similar TF-IDF scores - Scientific calculator vs To-do list UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 10
  • 11. Results: Google categories • Google’s categories are developer provided • Could be misleading • System calls could classify behavior better • Additional features required UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 11
  • 12. Results: Annotated classes • System calls not totally useless • 1-hot vs TF-IDF Classifier TF-IDF 1-hot MLP 0.44 0.26 SVM-Poly 0.31 0.23 SVM-RBF 0.32 0.21 J48 0.27 0.31 NB 0.27 0.27 UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 12
  • 13. Challenges faced • Annotating app behavior class • Emulator instabilities • App limitations/bugs • User credentials required • Multi-behavior apps UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 13
  • 14. Conclusions Can system calls be used to distinguish between how an app “behaves” and it’s perceived/stated purpose? • System calls – insufficient as features • Emulator – better ones required • Coarser behavior classes required UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 14
  • 15. Future work • System call bi-grams, tri-grams – for capturing call sequences • Better emulator – Genymotion • Malware classification – less scope (state-of-the-art is 96.7%) • Generating contextual policies from behavior estimation • Features planned for future: • App descriptions • App ratings • PHA app analysis – require samples for Mobile Unwanted Software UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 15
  • 16. Questions? UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 16

Editor's Notes

  1. RQ1: Can system calls be used to distinguish between how an app “behaves” and it’s perceived/stated purpose?
  2. Why should we care? Apps collect user data Emails, Messages, Documents, Sensor data – Highly Personal Data Can’t App permissions handle privacy and security of data? App permissions – “Take it or leave it” Is user okay with sharing location in public place not private place, no way to control that Use Privacy and Security module to implement context-dependent Rules
  3. play cat freq alarm_clock 128 battery_saver 93 drink_recipes 15 file_explorer 72 lunar_calendar 12 pdf_reader 22 scientific_calculator 61 to_do_list 102 video_playback 5 wifi_analyzer 24
  4. We had 61 scientific calculators and 102 to-do list apps in 534 that’s a combined percentage of 21% Very similar system calls but different app categories fstatat - get file status relative to a directory file descriptor ftruncate - truncate a file to a specified length pwrite - read from or write to a file descriptor at a given offset sigreturn - return from signal handler and cleanup stack frame sendmsg - send a message on a socket sigaction - examine and change a signal action
  5. System calls aren’t useless because the random choice for a 10 classes gives us a probability of 0.1 and we have 0.44 at best and 0.27 at worst Classifiers perform better with TF-IDF features; understandable because just like document classification presence of a system call is significant while presence of a unique system call is a better feature than presence of a particular system call or a group of system calls