This document summarizes an upcoming presentation on program analysis from a security perspective. It will cover topics like taint analysis, symbolic execution, concolic execution, disassembly, decompilation, and case studies analyzing the Tempesta tool versus CryptoPHP malware and the Ursnif malware. The presentation will be given on October 7, 2015 to the ISACA Venice Chapter by Antonio Parata and will explore program analysis techniques for understanding malicious behavior, identifying vulnerabilities, and reversing code.
2. 2ISACA VENICE Chapter
Agenda
▪ Introduction
➢ Mobile world
▪ Theory
➢ Taint Analysis
➢ Symbolic Execution
➢ Concolic Execution
▪ Code De/Obfuscation
➢ Disassembler
➢ Decompiler
▪ Case Studies
➢ Tempesta Vs CryptoPHP
➢ Ursnif
07/10/2015
3. 3ISACA VENICE Chapter
Who am I?
▪ LEAD THE COMMUNICATION VALLEY R&D TEAM
▪ OWASP ITALY BOARD MEMBER
▪ PASSIONATE ABOUT SOFTWARE SECURITY AND SOFTWARE
DEVELOPMENT
▪ Developed various security tools like: Nebula, Tempesta, and so on
▪ CONTACTS
▪ Blog: http://antonioparata.blogspot.it/
▪ GitHub: https://github.com/enkomio
07/10/2015
4. 4ISACA VENICE Chapter
Introduction (1/4)
CURRENT PRACTICE FOR SOFTWARE ASSURANCE
▪ Testing
▪ PRO: Concrete failure produces issues
▪ CONS: Expensive, difficult, hard to cover all code paths
07/10/2015
Malformed Input Program Oracle
Is it correct?
5. 5ISACA VENICE Chapter
Introduction (2/4)
CURRENT PRACTICE FOR SOFTWARE ASSURANCE
▪ Code Auditing
▪ PRO: Human can generalize beyond single runs
▪ CONS: Expensive, hard, no guarantees
07/10/2015
6. 6ISACA VENICE Chapter
Introduction (3/4)
STATIC ANALYSIS
▪ Analyze program’s code without running it
➢ In a sense, we are asking a computer to do what a human does during code
review
▪ PRO: much higher coverage
➢ Reason about many possible runs of the program
➢ Reason about incomplete programs
▪ CONS:
➢ Can only analyze limited properties
➢ May miss some errors, or have false positives
➢ Can be time consuming to run
07/10/2015
7. 7ISACA VENICE Chapter
Introduction (4/4)
07/10/2015
PROGRAM ANALYSIS
▪ Program Analysis offers static compile-time techniques for predicting safe
and computable approximations to the set of values or behaviours arising
dynamically at run-time when executing a program on a computer.
▪ Program Analysis is the process of automatically analyzing the behavior of
computer programs regarding a property such as correctness,
robustness, safety and liveness. Program analysis focuses on two major
areas: program optimization and program correctness.
Principles of Program Analysis
Wikipedia
…or put in another
way…
8. 8ISACA VENICE Chapter
Mobile (Malicious) World
07/10/2015
WHY MOBILE MALWARE?
▪ Lots of personal data
▪ Lots of business data
▪ Easy access to company infrastructure (BYOD)
▪ AV can only statically scan installed apps
▪ Mobile applications run in a restricted (sandbox) environment
▪ Unable to do “dirty” things
9. 9ISACA VENICE Chapter
Mobile Malware (1/3)
07/10/2015
▪ Malware downloaded from unofficial app store
▪ Malware installed from compromised computers connected to mobile
phone
▪ …malware installed from official store (see XCodeGhost)
▪ Simplelocker
10. 10ISACA VENICE Chapter
Mobile Malware (2/3)
07/10/2015
▪ XCodeGhost: a very clever idea
▪ Upload to un-official market a trojanized version of XCode IDE
▪ XCode is the de facto standard IDE to create iOS apps
▪ Every time that a new application is compiled a trojanized version of the
core libraries is linked with the mobile app
▪ The app is uploaded on the official market (App store) without knowing
that it was trojanized
▪ Once infected the device the malware collect information that are
encrypted and sent to the C&C server
▪ The infected mobile can also receive commands from the C&C server
11. 11ISACA VENICE Chapter
Mobile Malware (3/3)
07/10/2015
▪ How different is to analyze mobile apps?
▪ Not very different than normal (not mobile) applications
▪ Step1: get access to the application that you want to analyze
▪ Step2: create an environment where you can analyze the application (sandbox, emulator
and so on)
▪ Step3: profit
▪ Some useful tools:
12. 12ISACA VENICE Chapter
Taint Analysis (1/2)
TAINT ANALYSIS
▪ The taint analysis is a popular method which consists to check which
variables can be modified by the user input.
▪ The root cause of many attacks is trusting unvalidated input
➢ Input from user is tainted
➢ Various data is used, assuming it is untainted
▪ Examples:
07/10/2015
* http://sseblog.ec-spride.de/tools/flowdroid/
13. 13ISACA VENICE Chapter
Taint Analysis (2/2)
TAINT ANALYSIS TOOLS
07/10/2015
FlowDroid is a context-,
flow-, field-,
object-sensitive and
lifecycle-aware static
taint analysis tool for
Android applications
Taint analysis and
pattern matching with
Pin. Project part of the
Triton Framework, we will
see it later.
http://shell-storm.org/blo
g/Taint-analysis-and-patt
ern-matching-with-Pin/
Static Taint Analysis
Dynamic Taint Analysis
14. 14ISACA VENICE Chapter
Symbolic Execution (1/4)
DEFINITION
▪ A key goal of symbolic execution in the context of software testing is to
explore as many different program paths as possible in a given amount of
time, and for each path to generate a set of concrete input values
exercising that path, and check for the presence of various kinds of errors
including assertion violations, uncaught exceptions, security
vulnerabilities, and memory corruption.*
07/10/2015
* Symbolic Execution for Software Testing: Three Decades Later
15. 15ISACA VENICE Chapter
Symbolic Execution (2/4)
FORKING EXECUTION
▪ Symbolic executors can fork at branching points
➢ Happens when there are solutions to both the path condition and its negation
▪ How to systematically explore both directions?
➢ Check feasibility during execution and queue feasible path (condition)s for
later consideration
07/10/2015
* Symbolic Execution for Software Testing: Three Decades Later
16. 16ISACA VENICE Chapter
Symbolic Execution (3/4)
PATH EXPLOSION AND CONSTRAINT SOLVING PROBLEM
* How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2014)
▪ Path search: DFS (Depth First Search), BFS (Breadth First Search),
Random
▪ Constraint Solving: use an SMT solver
➢ A very popular SMT solver is Z3: https://github.com/Z3Prover/z3
07/10/2015
17. 17ISACA VENICE Chapter
Symbolic Execution (4/4)
KLEE LLVM EXECUTION ENGINE
▪ KLEE is a symbolic virtual machine built on top of the LLVM compiler
infrastructure
▪ Use the STP constraint solver (http://stp.github.io/)
▪ We need to modify the source code in order to run Klee
➢ We need to mark which variables should be considered as symbolic values
https://klee.github.io/
07/10/2015
18. 18ISACA VENICE Chapter
Concolic execution (1/4)
▪ Also called dynamic symbolic execution
▪ Instrument the program to do symbolic execution as the program runs
➢ Shadow concrete program state with symbolic variables
➢ Initial concrete state determines initial path, could be randomly generated
➢ Keep shadow path condition!
▪ Explore one path at a time, start to finish
➢ The next path can be determined by negating some element of the last path
condition, and solving for it, to produce concrete inputs for the next test
➢ Always have a concrete underlying value to rely on
07/10/2015
19. 19ISACA VENICE Chapter
Concolic execution (2/4)
▪ Concolic execution makes it really easy to concretize
➢ Replace symbolic variables with concrete values that satisfy the path condition
❖ Always have these around in concolic execution
▪ So, could actually do system calls!
➢ But we lose symbolic-ness at such calls
▪ And can handle cases when conditions are too complex for SMT solver
07/10/2015
22. 22ISACA VENICE Chapter
Disassembler
DISASSEMBLER
▪ It is a computer program that translates machine language into assembly
language—the inverse operation to that of an assembler. A disassembler
differs from a decompiler, which targets a high-level language rather than
an assembly language
07/10/2015
* Symbolic Execution for Software Testing: Three Decades Later
Disassembling
Application
Assembly Code
23. 23ISACA VENICE Chapter
Decompiler (1/5)
DECOMPILER
▪ Performs the reverse operation to that of a compiler
▪ Pro: The decompilation of bytecode is very powerful
▪ Cons: But the decompilation of binary code is not that good
07/10/2015
* Symbolic Execution for Software Testing: Three Decades Later
Decompilation
Application
Source Code
25. 25ISACA VENICE Chapter
Decompiler (3/5)
ANTI-DECOMPILATION TRICKS
IL_0014: NOP
IL_0015: LDARG.0
IL_0016: CALL INSTANCE VOID
CONSOLEAPPLICATION.SIMPLECLASS::SAYHELLO()
IL_001B: NOP
IL_001C: LDC.I4.1
IL_001D: STLOC.0
IL_001E: BR.S IL_0024
Disassemble
▪ The if branch is never taken and
the SayHello instance method is
never invoked.
The function always return false
regardless of the input value
▪ By convention before to call an
instance function a pointer to this
must be pushed on the stack. In
this way a pointer to this can be
read with ldarg.0
07/10/2015
26. 26ISACA VENICE Chapter
Decompiler (4/5)
ANTI-DECOMPILATION TRICKS
IL_0014: NOP
// IL_0015: LDARG.0
IL_0016: CALL INSTANCE VOID
CONSOLEAPPLICATION.SIMPLECLASS::SAYHELLO()
IL_001B: NOP
IL_001C: LDC.I4.1
IL_001D: STLOC.0
IL_001E: BR.S IL_0024
Assemble
▪ Open the msil source code file with your preferred editor
▪ Comment the loading of the this pointer
(add the characters // at the line start)
▪ Assemble the file with ilasm (ilasm.exe msil.il)
▪ Open your prefered decompiler and try to decompile the IsTwo routine
Decompile
07/10/2015
28. 28ISACA VENICE Chapter
Real World Program Analysis
MALWARE ANALYSIS
▪ Understand the malicious
behaviour of the program
SECURITY ASSESSMENT
▪ Identify possible vulnerabilities that
can compromise the security of
the application
REVERSE CODE ENGINEERING
▪ Understand how a specific
program works for further
analysis or to mimic the
behaviour
07/10/2015
29. 29ISACA VENICE Chapter
TEMPESTA
▪ A PHP source code analysis service, useful to quickly identify
interesting information of possible malicious PHP script
▪ Url:
➢ http://enkomio.com/tempesta/#/
➢ http://antonioparata.blogspot.it/2015/09/cryptophp-vs-tempesta.html
CRYPTOPHP
▪ CryptoPHP is a threat that uses backdoored Joomla, WordPress and
Drupal themes and plug-ins to compromise webservers on a large
scale. By publishing pirated themes and plug-ins free for anyone to
use instead of having to pay for them
▪ Info:
https://foxitsecurity.files.wordpress.com/2014/11/cryptophp-whitepap
er-foxsrt-v4.pdf
Case Studies – Tempesta Vs CryptoPHP (1/5)
07/10/2015
30. 30ISACA VENICE Chapter
Case Studies – Tempesta Vs CryptoPHP (2/5)
CRYPTOPHP
▪ It is tipically obfuscated (even if not with a very strong algorithm)
▪ Backdoor most common CMS in order to ensure persistence
➢ Wordpress plugin
➢ Joomla plugin
07/10/2015
31. 31ISACA VENICE Chapter
Case Studies – Tempesta Vs CryptoPHP (3/5)
WHICH KIND OF INFORMATION ARE WE INTERESTED IN?
▪ We are interested in info that allow us to known where the stolen
information are sent (Data Exfiltration) or where is located the C&C
➢ IP address
➢ Contacted Emails
➢ Contacted Urls
▪ How can we extract all this kind of information from that code?
▪ Symbolic Execution to the rescue: simulate the code and follow each
branches in order to try to cover all code path
➢ Pro: all paths are followed, with an high degree of reachability
➢ Cons: may cause some false positive
07/10/2015
33. 33ISACA VENICE Chapter
Case Studies – Tempesta Vs CryptoPHP (5/5)
STATIC ANALYSIS LIMITATION
▪ Let’s consider a very basic DGA algorithm:
▪ Who knows which are the contacted domains?
07/10/2015
34. 34ISACA VENICE Chapter
Case Studies – Ursnif malware (1/5)
URSNIF
▪ A Data-Stealing malware
▪ Info:
http://blog.trendmicro.com/trendlabs-security-intelligence/ursnif-the-
multifaceted-malware/
▪ MD5: 7B6A4CB12AAC9C30D46FF6CB60CBE684
▪ The analyzed sample is packed
➢ Difficult to do static analysis without first unpacking it
▪ After unpacking the sample injects itself in explorer.exe
➢ This choice is pretty common for malware, debugging explorer.exe is not
very user friendly
07/10/2015
35. 35ISACA VENICE Chapter
Case Studies – Ursnif malware (2/5)
DATA ENCRYPTION
▪ The stolen data are sent to the C&C in an encrypted form
▪ Example of request:
thfcxcofa.php?vlxch=mPihsm98FIH4Q/a6mVUmVvTw5k0eDh9uB1o86GNW
mHbGWWERbnoeFVdNbeqhqU/W+mqbmJbkReehn41IbaAm+2V5tI1Hzl1p7
gh7enGkgUJ4XzyM5c5dWs6kIyhLmRJV0TecNh3LTWNKjn/wSiCUyS==
▪ Page name and parameter name are randomly generated, starting
from a call to GetTickCount
▪ Base64 encoded data is encrypted by using an hardcoded key
“87694321POIRYTRI”
07/10/2015
37. 37ISACA VENICE Chapter
Case Studies – Ursnif malware (4/5)
DATA ENCRYPTION
▪ Try to understand that amount of code is very difficult if you don’t
have at least one hint on which type of algorithm is used
➢ You can try to identify it, e.g. by using YARA rules
❖ https://github.com/Yara-Rules/rules/blob/master/crypto.yar
▪ Emulate the code with a CPU Emulator
http://www.unicorn-engine.org/ https://github.com/buffer/pylibemu
07/10/2015
38. 38ISACA VENICE Chapter
Case Studies – Ursnif malware (5/5)
DATA ENCRYPTION
▪ By using the Unicorn engine we are able, with a “simple” python
script, to encrypt arbitrary data
# Initialize emulator
mu = Uc(UC_ARCH_X86, UC_MODE_32)
# map 2MB of memory for this emulation
CODE_BASE = 0x01000000
CODE_SIZE = 128 * 128
mu.mem_map(CODE_BASE, CODE_SIZE)
mu.mem_write(CODE_BASE, encrypt_data_code)
# Map the key value
KEY_MEM_SIZE = 1 * 128 * 128
KEY_MEM_BASE = 0x06000000
mu.mem_map(KEY_MEM_BASE, KEY_MEM_SIZE)
mu.mem_write(KEY_MEM_BASE + 0x100, used_key)
mu.reg_write(X86_REG_EAX, KEY_MEM_BASE + 0x100)
# Map the plaintext value
PLAINTEXT_MEM_BASE = 0x04000000
PLAINTEXT_MEM_SIZE = 1 * 128 * 128
mu.mem_map(PLAINTEXT_MEM_BASE, PLAINTEXT_MEM_SIZE)
mu.mem_write(PLAINTEXT_MEM_BASE, plaintex_string)
mu.reg_write(X86_REG_ECX, PLAINTEXT_MEM_BASE)
# Map the encrypted result memory
RESULT_MEM_BASE = 0x02000000
RESULT_MEM_SIZE = 1 * 128 * 128
mu.mem_map(RESULT_MEM_BASE, RESULT_MEM_SIZE)
# set-up stack memory
STACK_SIZE = 1 * 128 * 128
STACK_BASE = 0x7FFF0000
mu.mem_map(STACK_BASE, STACK_SIZE)
mu.reg_write(X86_REG_EBP, STACK_BASE + 0x1000)
mu.reg_write(X86_REG_ESP, STACK_BASE + 0x100)
mu.mem_write(STACK_BASE + 0x100 + 4, b"x00x00x00x02")
try:
mu.emu_start(CODE_BASE, CODE_BASE +
len(encrypt_data_code))
# read the result
result_mem = mu.mem_read(0x02000000, 0x10)
# read updated key
result_key = mu.mem_read(KEY_MEM_BASE + 0x100,
len(used_key))
except UcError as e:
print("ERROR: %s" % e)
return result_mem, result_key
07/10/2015