[CB17] Trueseeing: Effective Dataflow Analysis over Dalvik Opcodes

TRUESEEING:
EFFECTIVE
DATAFLOW ANALYSIS
OVER DALVIK
OPCODES
Takahiro / Ken-ya Yoshimura
(@alterakey / @ad3liae)

WHO WE ARE
➤ Takahiro Yoshimura 
(@alterakey)
➤ CTO, Monolith Works Inc.
➤ Keybase:  
https://keybase.io/alterakey
➤ Ken-ya Yoshimura 
(@ad3liae)
➤ CEO, Monolith Works Inc.
➤ Keybase: 
https://keybase.io/ad3liae 
➤ Monolith Works Inc. 
http://monolithworks.co.jp/ 
➤ Talks: DEF CON 25 Demo Labs

WHAT WE DO
➤ alterakey
➤ Security Researcher
➤ iOS/Android
➤ Network pentesting
➤ ad3liae
➤ Security Researcher
➤ iOS/Android

FINDING VULNERABILITIES
➤ Static Analysis
➤ Reversing the target and deriving its behavior
➤ Reversing data ﬂow is important
➤ Dynamic Analysis
➤ Running the target and seeing its behavior

PROBLEMS
➤ Obfuscation
➤ Common practice
➤ Hinders decompilers
➤ Dynamic Analysis
➤ Often unwanted :(

RELATED WORKS
➤ Mixing multiple decompilers 
(QARK et al.)
➤ Speed: even more time
➤ Fragility 
➤ Mixing alone does not answer the
question, IMHO..

WHY IS DECOMPILING HARD?
➤ Decompiling requires…
➤ Accurate disassembling
➤ Common code pattern 
(e.g. function prologue)
➤ Obfuscaters disrupt these

GO DIRECT
➤ Trueseeing
➤ Capable of
➤ Reversing data ﬂow
➤ Loosely guessing constants/typesets/…
➤ Manifest analysis
➤ Uses no decompilers
➤ Speed
➤ Resiliency
➤ D8-ready
➤ Readily available on PyPI!

DISASSEMBLING
➤ Toolchain
➤ apktool
➤ SQLite3 DB

MARKING UP
➤ Parsing
➤ Regular mnemonics (op)
➤ Directives
➤ .class / .method
➤ .implements / .super etc.
➤ Annotations
➤ Marking
➤ methods
➤ classes

GO FASTER
➤ Mapping codebase
➤ Constants
➤ Invocations
➤ sput
➤ iput
➤ Names (method, class)
➤ Class relationships
➤ Why SQL? 
— Complex queries matter
➤ Make DBs “think”

DATAFLOW TRACING (1)
➤ Lenient Backtracking
➤ From “interest”s to the args
➤ Attempt to trace “interests” back to
some constant 
(“solving” constant)
➤ Interests
➤ API call arguments etc.
➤ Match register refs/writes
➤ move*, const*

➤ Call tracing
➤ From args to the callers
➤ Climbing call stacks up
➤ Special case for handling p*
➤ Not always
➤ Currently R8 aggressively reuse p*
➤ WIP, soon to be ﬁxed

➤ Static trace
➤ Matching sget/sput
➤ Solving constants in sput

➤ Instansic trace
➤ Matching iget/iput
➤ Ignoring instance identity 
(WIP)
➤ Solving constants in iput

PATCH AND TIDY
➤ Partial update
➤ Disassemble
➤ Patch codebase/DB
➤ Re-assemble

BINARY PATCHING
➤ Removing (in smali)
➤ Removing insn
➤ Patch DB

AS AN EXPLOITATION TOOL
➤ Enabling debug
➤ Enabling full backup
➤ Replacing signature
➤ TLS un-pinning (WIP)

REPORTING
➤ Scoring vulnerabilities
➤ HTML: Readable, comprehensive report
➤ Text: CI-friendly report

SCORING VULNERABILITIES
➤ CVSS 3.0 Temporal
➤ Proﬁle based ﬁne-tuning
➤ Importance of vuln. classes

REPORTING IN HTML
➤ Comprehensive, crisp report
➤ Summary
➤ Description
➤ Solution
➤ Risk Factor
➤ CVSS score
➤ Instances
➤ For humans

REPORTING IN TEXT
➤ gcc-like
➤ For CI system or something
➤ Continuous security

CAPABILITY
➤ Most of OWASP Mobile Top 10 (2016)
➤ M1: Improper Platform Usage
➤ M2: Insecure Data Storage
➤ M3: Insecure Communication
➤ M4: Insecure Authentication
➤ M5: Insuﬃcient Cryptography
➤ M6: Insecure Authorization
➤ M7: Client Code Quality Issues
➤ M8: Code Tampering
➤ M9: Reverse Engineering
➤ M10: Extraneous Functionality

CASE STUDY
➤ #1: InsecureBankV2 
(DEFCON 25)
➤ #2: (CENSORED)
➤ #3: (CENSORED)
paper stack 1 SQ SEPIA 500X by wintersoul1 on flickr, CC-BY-NC-ND 2.0

CASE STUDY #1
➤ InsecureBankV2 (obfuscated)
➤ Announced at DEF CON 25
➤ Excellent ‘hack-me’ challenge
➤ Originally not obfuscated
➤ ProGuard rule based on: 
“proguard-android-optimize”
➤ More passes: 5 -> 8
➤ Allow all optimizations 
(i.e. HV class merging etc.)

M1: IMPROPER PLATFORM USAGE
➤ Insecure BroadcastReceiver
➤ Published with seemingly private
action name
➤ Backup-able

M3: INSECURE COMMUNICATION
➤ TLS interception
➤ Lack of certiﬁcate pinning

M5: INSUFFICIENT CRYPTOGRAPHY
➤ App is using cryptographic functions
with constant keys

CASE STUDY #2
➤ CENSORED: 
This page is unintentionally blank.
Blue Static by get directly down on flickr, CC-BY 2.0

M1: IMPROPER PLATFORM USAGE
➤ Massive privacy concerns
➤ Massive permission requests

M2: INSECURE STORAGE
➤ Something written in world readable
manner
➤ Massive logging
➤ Kind of classical no-no

M3: INSECURE COMMUNICATION
➤ Not certain, but yields strong indication
of cleartext HTTP
➤ Location?

M5: INSUFFICIENT CRYPTOGRAPHY
➤ App is using cryptographic functions
with constant keys

M8: CODE TAMPERING
➤ Embedded public keys
➤ What if we replace them?

CASE STUDY #3
➤ CENSORED: 
This page is unintentionally blank.
static by Trevor Bashnick on flickr, CC-BY-NC 2.0

M7: CLIENT CODE QUALITY
➤ App is registering custom JS interface
with addJavascriptInterface()
➤ in API < 17, JS interfaces could be
exploited to arbitrary OS command
execution
➤ Condition:
➤ Controlling content
➤ Targets or runs API < 17
static by Trevor Bashnick on flickr, CC-BY-NC 2.0

GO FURTHER
➤ Roadmaps, TBDs
➤ Further binary patching mode
➤ Further accuracy
➤ Further signatures
➤ Further exploitation mode
➤ ARM code analysis
➤ MSIL code analysis
➤ iOS support
➤ True symbolic exec.
➤ Automatic dynamic analysis
摩周湖 by Sendai Blog on flickr, CC-BY 2.0

FURTHER BINARY PATCHING
➤ Status: Mostly done (PR soon)
➤ Introducing variable (in smali)
➤ Allocate a local
➤ Assign constant
➤ Replace oﬀending arg.
➤ Patch DB
➤ Introducing function (in smali)
➤ Introduce templated function
➤ Introduce calls
➤ Patch DB
➤ Opens the way to more automatic code ﬁxes

FURTHER ACCURACY
➤ Status: Mostly done (PR soon)
➤ Zoning storage 
(e.g. external as insecure)
➤ Solving only interesting args
➤ Selectively emulate API 
(e.g. StringBuilder)
➤ Recognizing more TLS pinning modes
➤ Carefully evaluate conﬁdence

FURTHER SIGNATURES
➤ Status: WIP
➤ HTTP parameter injection
➤ Path traversal
➤ Client-side XSS/SQLi
➤ Weak crypto algorithms
➤ Insuﬃcient root detection
➤ Questionable use of sensitive data
➤ Taint analysis
➤ File I/O
➤ Network I/O

FURTHER EXPLOITS
➤ Status: WIP
➤ TLS Unpinning
➤ Forcefully enabling logging
➤ Exploit generation on issues
➤ Reversing API spec?

ARM CODE ANALYSIS
➤ Status: WIP
➤ Native code analysis
➤ Considering radare2 (r2) and/or VEX IR
➤ Problem:
➤ r2 takes time
➤ r2 seemingly cannot disassemble the
whole executable at once 
(cf. Produce File in IDA)

MSIL CODE ANALYSIS
➤ Status: WIP
➤ Mainly old versions of Unity (Mono)
➤ Considering use of CoreCLR

IOS
➤ Status: WIP
➤ Swift, Objective-C, bitcode analysis
➤ Considering use of radare2, VEX IR and
LLVM tools
➤ Problems: 
Much as same as ARM code analysis

TRUE SYMBOLIC EXEC.
➤ Status: In Research
➤ Symbolic exec. will help
➤ Forward analysis
➤ Evaluating reachability
➤ With it, we might be able to do..?
➤ Partial evaluation 
(e.g. Reversing transforms)
➤ Gaining more accuracy
➤ Gaining resiliency against more
advanced obfuscaters
➤ Considering use of VEX IR

AUTOMATIC DYNAMIC ANALYSIS
➤ Status: In Research
➤ Similar to MobSF

CONCLUSION
➤ We saw it is…
➤ Fast
➤ Accurate
➤ Intuitive
➤ Free as freedom
IMG_2988s by 不憂照相館 on flickr, CC-BY-NC-ND 2.0

FAST
➤ No decompiling
➤ Fast lookup with SQL
➤ Because complex query matters

ACCURATE (1)
➤ We derive data ﬂow directly over Dalvik
opcodes
➤ Lenient Backtracking
➤ Call stack tracing
➤ Static tracing
➤ Instansic tracing

ACCURATE (2)
➤ We can detect issues in (obfuscated) apps
➤ M1: inappropriate CP/BR exports,
privacy concerns, enabled debug/backup
bit etc.
➤ M2: insecure ﬁle permissions, logging
etc.
➤ M3: cleartext HTTP, TLS non-pinning etc.
➤ M5: static keys etc.
➤ M7: WebView insecurities etc.
➤ M8: embedded public keys etc.
➤ M9: non-obfuscation

INTUITIVE
➤ Comprehensive reporting
➤ HTML for humans
➤ Text for CI
➤ Continuous security

FREE AS FREEDOM
➤ GPL-3
➤ https://github.com/monolithworks/
trueseeing
➤ It remains free for good
➤ More ﬁxes and sigs to come
➤ We are striving to make it not only useful
but also essential
Freedom by Mochamad Arief on flickr, CC-BY-NC-ND 2.0

FIN.
9.11.2017 Monolith Works Inc.

[CB17] Trueseeing: Effective Dataflow Analysis over Dalvik Opcodes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [CB17] Trueseeing: Effective Dataflow Analysis over Dalvik Opcodes

Similar to [CB17] Trueseeing: Effective Dataflow Analysis over Dalvik Opcodes (20)

More from CODE BLUE

More from CODE BLUE (20)

Recently uploaded

Recently uploaded (20)

[CB17] Trueseeing: Effective Dataflow Analysis over Dalvik Opcodes