Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

5,395 views

Published on

There has been no shortage of Android malware analysis reports recently, but thus far that trend has not been accompanied with an equivalent scale of released public Android application tools or frameworks. To address this issue, we are presenting the Scalable Tailored Application Analysis Framework (STAAF), released as a new OWASP project for public use under Apache License 2.0. The goal of this framework is to allow a team of one or more analysts to efficiently analyze a large number of Android applications. In addition to large scale analysis, the framework aims to promote collaborative analysis through shared processing and results.

Our framework is designed using a modular and distributed approach, which allows each processing node to be highly tailored for a particular task. At the heart of the framework is the Resource Manager (RM) module, which is responsible for tracking samples, managing analysis modules, and storing results. The RM also serves to reduce processing time and data management through the deduplication of data and work, and it also aids with the scheduling of tasks so that they can be completed as a pipeline or as a single unit. When processing begins, the RM uses several default "primitive" modules that carry out the fundamental operations, such as extracting the manifest, transforming the Dalvik bytecode, and extracting application resources. The analysis modules then use the raw results to extract specific attributes such as permissions, receivers, invoked methods, external resources accessed, control flow graphs, etc., and these results are then stored in a distributed data store, after which the information can be queried for high level trends or targeted searches.

The modular nature of our framework allows independent analyses to happen on a per module basis, and the results of this data processing can be merged with other results at a later time. This design promotes an agile approach to large scale analysis, because it permits a wide array of analysis to happen distributively and in parallel. This means that teams with different needs or schedules can complete time-sensitive tasks separately with minimized data processing pipelines, while allowing more complex or time intensive tasks to be added later. Additionally, if analysis needs to be branched at some point in the pipeline, intermediate results can be retained and additional modules can be added leveraging the results from the past analysis steps. The results are also stored in a distributed database and designed to be queried using a map-reduce style query, which offers performance efficiencies as well as allowing the transparent inclusion of remote third party analysis databases. By using this plug-in style analysis framework, we are able to attain more efficient processing schedules and tailor the analysis for a specific need.

This framework is designed to be scalable and extensible, and the initial offering of this framework includes several modules...

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

  1. 1. STAAF An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis OWASP AppSec USA Thursday, September 22, 20111 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  2. 2. Allow Me to Introduce Myself Ryan W Smith VP Engineering at Praetorian  OWASP DFW Chapter Leader (2011)  Active member of The Honeynet Project (2002- )  8+ years of work with DoD, Intelligence Community, Federal/State/Local governments, and Fortune 500 companies2 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  3. 3. 3 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  4. 4. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions4 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  5. 5. What can STAAF do for you? Observation #1: There are a lot of Android app analysis tools freely available BUT: They’re typically designed for single app analysis STAAF leverages the power of these tools as modules, And adds efficiency, scalability, data mgmt and sharing5 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  6. 6. What can STAAF do for you? Observation #2: Higher value analysis can be attained by analyzing large numbers of applications over long periods of time SOLUTION: Reduce the time and complexity for an analyst to process large numbers of apps Goal Analyze 50k apps in less than 2 days and make the extracted data readily available to analysts6 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  7. 7. What can STAAF do for you? Process Process Meaningful Apps Data Results Mod #1 Mod A Mod #2 Mod B Analytic Mod #3 Data Mod C Mod #N Mod Z Goal Minimize analysts’ effort to extract meaningful results from a large number of applications7 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  8. 8. What is STAAF8 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  9. 9. What is STAAF9 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  10. 10. What is STAAF10 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  11. 11. What is STAAF11 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  12. 12. What is STAAF12 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  13. 13. What STAAF is NOT• STAAF is not a stand alone application• STAAF is not only a malware detection or anti-virus engine• STAAF is not an application collection toolSTAAF is a problem agnostic app analysis framework13 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  14. 14. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions14 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  15. 15. Android’s Open App Model • Low barrier to entry • Apps hosted and installed from anywhere • All apps are created equal • No distinction between core apps and 3rd party apps • Accept apps based on: 1. Trust of the source 2. Permissions requested15 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  16. 16. “Legitimate” Monitoring Apps Ad/Marketing Networks Social Gaming Networks16 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  17. 17. “Not-So-Legitimate” Permission Use SMS Trojan – Link to site hosting rogue app for “free movie player” – Sends 2 Premium SMS messages to a Kazakhstan number (about $5 per message) Gemini – Repackaged apps in Chinese markets – Sex positions and MonkeyJump2 are known examples – Central C&C – Exfiltrates unique device identifiers – Downloads and Install New Apps (with permission) DroidDream – Approx. 50 Malicious apps in official market – Central C&C – Exfiltrates unique device identifiers – Downloads additional code modules17 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  18. 18. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions18 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  19. 19. STAAF Workflow Step 0: STAAF components initialized19 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  20. 20. STAAF Workflow Step 1: Users sends APKs to be processed20 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  21. 21. STAAF Workflow Step 2: Coordinator checks database for previous results and logs new instance data for each APK21 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  22. 22. STAAF Workflow Step 3: Coordinator sends new APKs to the file repository service22 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  23. 23. STAAF Workflow Step 4: Coordinator sends tasking orders to the task queue23 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  24. 24. STAAF Workflow Step 5: Elastic computing nodes pull tasks from their designated task queue24 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  25. 25. STAAF Workflow Step 6: Elastic computing nodes pull in the APK and related information25 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  26. 26. STAAF Workflow Step 6: After processing the elastic computing nodes push out processed files and analysis results26 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  27. 27. STAAF Workflow Step 7: When all tasking is complete elastic computing nodes notify the coordinator27 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  28. 28. Task Modules • Can be registered dynamically • Task-Oriented – High level • What % of apps use permission X COMPLEXITY • What is the mostPRIORITY common libraries used – Mid level • Extract Permissions • Extract static URLs • Extract Methods Called – Low level • Extract manifest • Extract Dex bytecode28 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  29. 29. Deduplication of Effort • All Intermediate data are cached for later use – Extract and convert manifest to ASCII – Extract Dex and convert to Smali and Java – Compute the control flow graph from the Dex • Libraries and shared resources must only be processed once • Apps must only be processed once by each module, everSmall savings matter at large scales29 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  30. 30. Distributed Data Sharing • Sharing app samples is just the beginning • Share the entire process: – Raw Application – Extracted Resources – Raw Data – Processed Data • Or set specific limits on what data is shared30 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  31. 31. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions31 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  32. 32. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central3 1h56m 500 1 4 Central4 0h36m 500 1 4 Local5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local8 0h27m 1722 5 5 Local9 1h19m 9349 5 10 Local Achieved 50k apps in ~7 hours* *Extrapolated from shorter tests 32 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  33. 33. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Central DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 1 Node Node3 1h56m 500 1 4 Central 5 1 ECU4 0h36m 500 1 4 Local ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 2h25m 0h36m9 1h19m 9349 5 10 Local“One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.” -Amazon 33 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  34. 34. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Central DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 4 Node Nodes3 1h56m 500 1 4 Central 1 ECU 1 ECU4 0h36m 500 1 4 Local5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 2h25m 1h56m9 1h19m 9349 5 10 Local STAAF is bound by both CPU and database throughput 34 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  35. 35. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Local DB DB1 2h25m 500 1 1 Central 4 42 2h00m 500 1 2 Central Nodes Nodes3 1h56m 500 1 4 Central 1 ECU 1 ECU4 0h36m 500 1 4 Local5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 1h56m 0h36m9 1h19m 9349 5 10 Local By using distributed, local databases STAAF achieves a significant time performance increase 35 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  36. 36. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Local DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 4 Node Nodes3 1h56m 500 1 4 Central 1 ECU 1 ECU4 0h36m 500 1 4 Local5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 2h25m 0h36m9 1h19m 9349 5 10 Local Using by adding multiple processors with local databases, we achieve near linear scalability 36 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  37. 37. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Local Central DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 4 1 Nodes Node3 1h56m 500 1 4 Central 5 1 ECU4 0h36m 500 1 4 Local ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 0h36m 0h36m9 1h19m 9349 5 10 Local By simply increasing the CPU capacity to 5 ECUs, we achieve the same performance as four 1 ECU nodes 37 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  38. 38. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Central DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 4 Node Nodes3 1h56m 500 1 4 Central 5 54 0h36m 500 1 4 Local ECUs ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 0h36m 0h28m9 1h19m 9349 5 10 Local Once again, using a central database fails to achieve linear performance gains 38 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  39. 39. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Local DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 4 Node Nodes3 1h56m 500 1 4 Central 5 54 0h36m 500 1 4 Local ECUs ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 0h36m 0h10m9 1h19m 9349 5 10 Local By using distributed, local databases we once again achieve near linear performance gains 39 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  40. 40. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Central Local DB DB1 2h25m 500 1 1 Central2 2h00m 500 1 2 Central 1 4 Node Nodes3 1h56m 500 1 4 Central 5 1 ECU4 0h36m 500 1 4 Local ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local Compute Time Compute Time8 0h27m 1722 5 5 Local 2h25m 0h10m9 1h19m 9349 5 10 Local By increasing CPU capacity, number of processing nodes, and number of databases, we decreased processing time by 14.5x 40 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  41. 41. Time Trials STAAF Performance Tests# Time Apps ECUs Nodes Database Local Local DB DB1 2h25m 500 1 1 Central 5 102 2h00m 500 1 2 Central Nodes Nodes3 1h56m 500 1 4 Central 5 54 0h36m 500 1 4 Local ECUs ECUs5 0h36m 500 5 1 Central6 0h28m 500 5 4 Central7 0h10m 500 5 4 Local 1722 Apps 9349 Apps8 0h27m 1722 5 5 Local Compute Time Compute Time9 1h19m 9349 5 10 Local 0h27m 1h19m Larger tests confirm that STAAF continues to scale linearly 41 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  42. 42. Initial Results :: Permissions Requests 53,000 Applications Analyzed Permissions Requested  Android Market: ~48,000  Average: 3  3rd Party Markets: ~5,000  Most Requested: 117 Location Data 11,929 (24%) Read Contacts 3,636 (8%) Send SMS 1,693 (4%) Receive SMS 1,262 (4%) Record Audio 1,100 (2%) Read SMS 832 (2%) Process Outgoing Calls 323 (1%)42 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  43. 43. Additional Results :: Shared Libraries 53,000 Applications Analyzed  Android Market: ~48,000  3rd Party Markets: ~5,000 com.admob 38% (18,426 apps ) org.apache 8% ( 3,684 apps ) com.google.android 6% ( 2,838 apps ) com.google.ads 6% ( 2,779 apps ) com.flurry 6% ( 2,762 apps ) com.mobclix 4% ( 2,055 apps ) com.millennialmedia 4% ( 1,758 apps)43 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  44. 44. Permissions Are Not a Good Indicator Malware only needs a single permission zsones & Droid Dream SMS Replicator Fake Security Tool Gemeni SMS Trojan44 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  45. 45. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions45 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  46. 46. STAAF’s Future • Build a publically available user interface • Provide a dashboard with global stats • Further Tune database performance issues • Build more complex analysis modules – Static data flow analysis – Dynamic sandbox analysis • Expose a public module interface through UI46 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  47. 47. Presentation Roadmap • STAAF (Overview) • Background • STAAF (Deep Dive) • Results • Future Work • Conclusions47 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  48. 48. Final Thoughts • STAAF is a system of systems and services, not an application • STAAF enables large scale Android application analysis • STAAF is problem agnostic and can be tailored to answer many analytic questions • STAAF augments the capabilities of the analyst, it does not replace them • STAAF achieves scalable performance increases by increasing computer nodes/power48 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
  49. 49. Q&A49 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

×