TriggerScope: Towards Detecting Logic Bombs in Android Applications

TriggerScope: Towards Detecting Logic Bombs
in Android Applications
Yanick Fratantonio, Antonio Bianchi, William Robertson, Engin
Kirda, Christopher Kruegel, Giovanni Vigna
Pietro De Nicolao
Politecnico di Milano
June 21, 2016
1 / 26

Introduction to logic bombs
TriggerScope
Experiment
Critique
Related and future work
2 / 26

The paper
TriggerScope: Towards Detecting Logic Bombs
in Android Applications
Yanick Fratantonio∗, Antonio Bianchi∗, William Robertson†, Engin Kirda†, Christopher Kruegel∗, Giovanni Vigna∗
∗UC Santa Barbara
{yanick,antoniob,chris,vigna}@cs.ucsb.edu
†Northeastern University
{wkr,ek}@ccs.neu.edu
Abstract—Android is the most popular mobile platform today,
and it is also the mobile operating system that is most heavily
targeted by malware. Existing static analyses are effective in
detecting the presence of most malicious code and unwanted
information flows. However, certain types of malice are very dif-
ficult to capture explicitly by modeling permission sets, suspicious
API calls, or unwanted information flows.
One important type of such malice is malicious application
logic, where a program (often subtly) modifies its outputs or per-
forms actions that violate the expectations of the user. Malicious
application logic is very hard to identify without a specification of
the “normal,” expected functionality of the application. We refer
to malicious application logic that is executed, or triggered, only
App store providers invest significant resources to protect
their users and keep their platforms clean from malicious
apps. To prevent malicious apps from entering the market
(and to detect malice in already-accepted applications), these
providers typically use a combination of automated program
analysis (e.g., Google Bouncer [41]) and manual app reviews.
These automated approaches leverage static and/or dynamic
code analysis techniques, and they aim to detect potentially-
malicious behaviors – e.g., exfiltrating personal private infor-
mation, stealing second-factor authentication codes, sending
text messages to premium numbers, or creating mobile bot-
2016 IEEE Symposium on Security and Privacy2016 IEEE Symposium on Security and Privacy
3 / 26

Introduction to logic bombs
4 / 26

Logic bombs
Logic bomb: malicious application logic that is triggered only
under certain (narrow) conditions.
Malicious application logic: violation of user’s reasonable
expectations
Malware is designed to target speciﬁc victims, under certain
circumstances
Example: take a navigation application, supposed to help a soldier
in a war zone ﬁnd the shortest route to a location.
after a given (hardcoded) date, it gives to him longer, more
dangerous routes
5 / 26

Another (real) example
RemoteLock: Android app that allows the user to remotely lock
and unlock the device with a SMS containing an user-deﬁned
keyword
The app code also contains the following check:
(!= (#sms/#body equals "adfbdfgfsgyhytdfsw")) 0)
The predicate is triggered when an incoming SMS contains that
hardcoded string
By sending a SMS containing that string, the device unlocks
That’s a backdoor implemented with a logic bomb!
6 / 26

Problems with traditional defenses
App Stores employ some defenses, but they are not suﬃcient.
Static analysis: malicious application logic doesn’t require
additional privileges or make “strange” API calls
Malicious behavior is deeply hidden within the app logic
Example: the malicious navigation app. . . behaved like any
navigation app!
Dynamic analysis: likely won’t execute code triggered only on
a future date or in a certain location
Code coverage problems
Can be detected and evaded
Even if covered, how to discern malicious behavior from benign?
Manual audit: if source code is not available, no guarantees
Code can be obfuscated
7 / 26

Key observation
TriggerScope detects logic bombs by precisely analyzing and
characterizing the checks (conditionals, predicates) that guard
a given behavior.
It gives less importance to the (malicious) behavior itself.
if(sms.getBody().equals("adfbdf...")) // Look here!
{
myObject.doSomething(); // ...not there.
}
9 / 26

Trigger analysis
Predicate: logic formula used in a conditional statement
(&& (!= (#sms/#body contains "MPS:") 0) (!=
(#sms/#body contains "gps") 0))
Suspicious predicate: a predicate satisfied only under very
specific, narrow conditions
Functionality: a set of basic blocks in a program
Sensitive functionality: a functionality performing, directly or
indirectly a sensitive operation
In practice: all calls to Android APIs protected by permissions,
and operations involving the filesystem
Trigger: suspicious predicate controlling the execution of a
sensitive functionality
10 / 26

Analysis overview (1)
1. Static analysis of bytecode; building of Control Flow Graph
2. Symbolic Values Modeling for integer, string, time, location
and SMS-based objects
3. Expression Trees are built and appended to each symbolic
object referenced in a check
Reconstruction of the semantics of the check, often lost in
bytecode
'DWHDIWHU'DWH

QRZ
Figure 4: Example of an expression tree.
against sensitive values, in terms of both false positives and
false negatives. The details about this step are provided in
Section III-E.
methods (e.
Moreover, ou
from APIs su
which are p
checks on S
sender retriev
modeling of
Intents a
store).
Similarly,
Figure 1: Example of expression tree.
11 / 26

4. Block Predicate Extraction: edges of Control Flow Graph
are annotated with simple predicates
Simple predicate: P in if P then X else Y
5. Path Predicate Recovery and Minimization
Combine simple predicates to get the full path predicate that
reaches each basic block
Minimization: elimination of redundant terms in predicates
important to reduce false dependencies
12 / 26

Path Predicate Recovery and Minimization
f
b0
b1
b2 b3
b4
q
g() h()
p ⋀ q ¬p ⋀ q
(p ⋀ q) ⋁ (¬p ⋀ q) = q
b5
procedu
basic bl
forward
traversa
Boolean
block to
quences
logical
to branc
correspo
only cap
but also
for a blo
Note
icate rec
that con
moved,
dencies
if a bas
evaluate
Full predicate,
minimization
Simple
predicates
13 / 26

6. Predicate Classification: a check is suspicious if it’s
equivalent to:
Comparison between current time value and constant
Bounds check on GPS location
Hard-coded patterns on body or sender of SMS
7. Control-Dependency Analysis: control dependency between
suspicious predicates and sensitive functionalities.
sensitive = privileged Android APIs + fileystem ops
Suspiciousness propagates with data flows and callbacks
Problem: data flows through files
When in doubt: suspicious!
8. Post-processing: whitelisting for some edge cases
14 / 26

Data sets
Benign applications: 9582 apps from Google Play Store
They all use time-, location- or SMS-related APIs
Actually, TriggerScope identiﬁed backdoors in two “benign”
apps, conﬁrmed by manual inspection!
Malicious applications: 14 apps from several sources
Stealthy malware developed for previous researches
Real-world malware samples
HackingTeam RCSAndroid
16 / 26

Results of analysis
Analysis step TP FP TN FN FPR FNR
Predicate detection 14 1386 7927 0 14.88% 0%
Suspicious Predicate A. 14 462 8851 0 4.96% 0%
Control-Dependency A. 14 117 9196 0 1.26% 0%
TriggerScope (all) 14 35 9278 0 0.38% 0%
Table 1: Results of analysis after each step. Note how each step is useful
to reﬁne the analysis.
FPR = FP
FP+TN , FNR = FN
FN+TP
17 / 26

False Positives decreasing step after step
False Positives after each analysis step
0 400 800 1200 1600
Predicate detection
Suspicious predicate a.
Control-dependency a.
TriggerScope (all)
Analysisstep
Foglio1
Figure 3: Each analysis stage is useful, because it reduces False Positives.
18 / 26

Strengths
TriggerScope provides rich semantics on predicates
Great help for manual audits
This makes the tool extensible, open for future research
Novel approach: focus on checks, not malicious behaviors
Fewer FPs, FNs than other tools
20 / 26

Issues: limits of analysis
Definition of suspicious predicate is too narrow
Only checks against hardcoded values are considered
Triggers can come from the network or elsewhere
Authors claim 0% FNs, but the evaluation isn’t conclusive
we manually inspected a random subset of 20 applications for
which our analysis did not identify any suspicious check. We
spent about 10 minutes per application, and we did not find any
false negatives.
Difficult to assess FNs if no tool finds anything and source code
is unavailable
This analysis is still blacklisting: listing things we don’t like
We’re competing against attackers’ creativity
21 / 26

Issues: evasion techniques
Reflection, dynamic code loading, polymorphism and
conditional code obfuscation [Sharif, 2008] can defeat static
analysis.
Authors say that these techniques are themselves suspicious, but
they also have legitimate uses
Predicate minimization is NP-complete
Is it possible to design “pathological” code to slow down and
defeat analysis?
Or result in very complex, meaningless predicates?
Exceptions were not cited as a control flow subversion method
Statically reasoning in the presence of exceptions and about the
effects of exceptions is challenging [Liang, 2014]
Unclear how the static analysis engine handles exceptions
Unchecked exceptions (e.g. division by zero) could be exploited
as stealthy triggers
22 / 26

Related and future work
23 / 26

Related work: AppContext
AppContext [Yang, 2015]: supervised machine learning method to
classify malicious behavior statically
1. Starts identifying suspicious actions
2. Context: which category of input controls the execution of
those actions?
Similar idea: just looking at the action isn’t enough. Diﬀerences:
AppContext only classiﬁes triggers as suspicious or not;
TriggerScope also provides semantics about the predicates,
helping manual inspection
AppContext does not consider the typology of the predicate,
only the type of its inputs
Higher FP rate than TriggerScope
24 / 26

Future evolutions
Extend trigger analysis not only to time, location, SMSs
The trigger could come e.g. from the network
The framework is easily extensible to other types of triggers
with more work to model other symbolic values
Is there a more general approach?
Quantitative analysis of predicate “suspiciousness”
Currently, it’s deﬁned in a qualitative, ad-hoc way
Could be combined with classiﬁcation methods
25 / 26

TriggerScope: Towards Detecting Logic Bombs in Android Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to TriggerScope: Towards Detecting Logic Bombs in Android Applications

Similar to TriggerScope: Towards Detecting Logic Bombs in Android Applications (20)

Recently uploaded

Recently uploaded (20)

TriggerScope: Towards Detecting Logic Bombs in Android Applications