SlideShare a Scribd company logo
1 of 75
Download to read offline
Dynamic Binary Analysis and Obfuscated Codes
How to don’t kill yourself when you reverse obfuscated codes.
Jonathan Salwan and Romain Thomas
St’Hack in Bordeaux, April 8, 2016
Quarkslab
About us
Romain Thomas
• Security Research Engineer at Quarkslab
• Working on obfuscation and software protection
Jonathan Salwan
• Security Research Engineer at Quarkslab
• Ph.D student on Software Verification supervised by:
• S´ebastien Bardin from CEA
• Marie-Laure Potet from VERIMAG
2
Roadmap of this talk
1. Obfuscation introduction
2. Dynamic Binary Analysis introduction
3. The Triton framework
4. Conclusion
5. Future works
3
Obfuscation Introduction
What is an obfuscation?
Wikipedia: ”Obfuscation is the obscuring of intended meaning in
communication, making the message confusing, willfully ambiguous, or
harder to understand.” 1
1
https://en.wikipedia.org/wiki/Obfuscation
5
Why softwares may contain obfuscated codes?
• Intellectual property
• DRM
• Hiding secrets
6
What kind of obfuscations may we find in modern softwares?
• Opaque predicates
• Control-flow flattening
• Virtualization
• MBA and bitwise operations
• Use of uncommon instructions.
7
Example: Opaque predicates
• Objective: Create unreachable basic blocks
• The constraint ¬π1 is always UNSAT
• The basic block ϕ3 is never executed
ϕ1
ϕ2 ϕ3
ϕ4
π1
¬π1
Figure 1: Control Flow Graph
8
Example: CFG flattened
• Objective: Remove structured control flows
• The basic block ϕ2 is now used as dispatcher
• The dispatcher manages the control flow
• Static analysis: hard to predict which basic block will be called next
ϕ1 ϕ2
ϕ3
ϕ4
ϕ5 ϕ6
Figure 2: Original CFG
ϕ1
ϕ2
ϕ3 ϕ4 ϕ5 ϕ6
Figure 3: Flattened CFG
9
Example: Virtualization
• Objective: Emulate the original code via a custom ISA (Instruction Set
Architecture)
• Example:
xor R1, R2 push R1
push R2
mov eax, [esp]
mov ebx, [esp - 0x4]
xor eax, ebx
push eax
10
Example: Virtualization
Figure 4: An example of a VM’s CFG
11
Example: MBA and bitwise operations
• Objective: Transform the normal form of an expression to a more complex one
• The transformation output may also be transformed again and so one
a + b = (a ∨ b) + (a ∧ b)
a ∗ b = (a ∧ b) ∗ (a ∨ b) + (a ∧ ¬b) ∗ (¬a ∧ b)
a ⊕ b = (a ∧ ¬b) ∨ (¬a ∧ b)
a ⊕ b = ((a ∧ ¬a) ∧ (¬b ∨ ¬a)) ∧ ((a ∨ b) ∨ (¬b ∨ b))
0 = (a ∨ b) − (a + b) + (a ∧ b)
12
Example: Use of uncommon instructions
• Objective:
• Break your tools
• Break your mind!
• May transform classic operations
using AVX and SSE
Figure 5: Uncommon instructions
13
Dynamic Binary Analysis Introduc-
tion
What is a DBA?
• Dynamic Binary Analysis
• Any way to analyze a binary dynamically
• Most popular analysis
• Dynamic information extraction
• Dynamic taint analysis [4]
• Dynamic symbolic execution [3, 2, 6, 1]
15
Why use a DBA?
• To get runtime values at each program point
• To get the control flow for a given input
• To follow the spread of a specific data
16
What is a dynamic taint analysis?
• Taint analysis is used to follow a specific information through a data flow
• Cell memory
• Register
• The taint is spread at runtime
• At each program point you are able to know what cells and registers interact with
your initial value
17
What is a dynamic symbolic execution?
• A DSE is used to represent the control and the data flow of an execution into
arithmetical expressions
• These expressions may contain symbolic variables instead of concrete values
• Using a SMT solver 23 on these expressions, we are able to determine an input for
a desired state
2
https://en.wikipedia.org/wiki/Satisfiability modulo theories#SMT solvers
3
http://smtlib.cs.uiowa.edu
18
SBA vs DBA
• Static Binary Analysis
• Full CFG
• No concrete value
• Often based on abstract analysis
• Scalable
• False positive
• Too complicated for analyze obfuscated code
• Dynamic Binary Analysis
• Partial CFG (only one path at time)
• Concrete values
• Often based on concrete analysis
• Not scalable
• Less false positive
• Lots of static protections may be broken
19
Online vs offline analysis
• Online analysis
• Extract runtime information
• Inject runtime values
• Interact and modify the control flow
• Good for fuzzing
• Offline analysis
• Store the context of each program point into a database
• Apply post analysis
• Display the context information using both static and dynamic paradigms
• Good for reverse
20
Offline analysis good for reverse
Bin
DB
Dynamic
Extraction
Analysis
API
IDA
Static
Extraction
Figure 6: Example of an offline analysis infrastructure
21
Offline analysis and symbolic emulation
• Explore more than one path using symbolic emulation from a concrete path
• From one path emulate them all
ϕ1 ϕ2
ϕ3
ϕ4
ϕ5 ϕ6
Figure 7: Concrete execution
22
Offline analysis and symbolic emulation
• Keep both concrete and symbolic values of each symbolic variable
• Use the concrete value for the emulation part and the symbolic value for
expressions and models
• Get the model of the new branch and restore the concrete value of the symbolic
variable
ϕ1 ϕ2
ϕ3
ϕ4
ϕ5 ϕ6
¬π
π
Get model
Figure 8: Symbolic emulation from a concrete path
23
Offline analysis and symbolic emulation
• Concrete and emulated paths are merged with disjunctions to get a coverage
expression
ϕ1 ϕ2
ϕ3
ϕ4
ϕ5 ϕ6
¬π
π
Get model
pre ∧ (ϕ3 ∨ ϕ4) ∧ post
Figure 9: Disjunction of paths
24
The Triton [5] framework
Triton in a nutshell
• Dynamic Binary Analysis Framework
• x86 and x86 64 binaries analysis
• Dynamic Taint Analysis
• Dynamic Symbolic Execution
• Partial Symbolic Emulation
• Python or SMT semantics representation
• Simplification passes
• Python and C++ API
• Tracer independent
• A Pintool 4
is shipped with the project
• Free and opensource 5
4
https://software.intel.com/en-us/articles/pintool/
5
http://triton.quarkslab.com
26
The Triton’s design
Symbolic
Execution
Engine
Taint
Engine
IR
SMT2-Lib
Semantics
SMT
Solver
Interface
SMT
Optimization
Passes
Triton internal components
Multi-Arch
design
Multi-Arch
design
Taint
Engine
API
C++ / Python
Pin
Valgrind
DynamoRio
Unicorn
DB (e.g: mysql)
Example of Tracers
LibTriton.so
Figure 10: The Triton’s design
27
The Triton’s design
• libpintool.so
• Used as tracer to give the execution context to the Triton library
• Python bindings on some Pin’s features
• libtriton.so
• Takes as input opcodes and a potential context
• Contains all engines and analysis
• Python and C++ API
28
In what scenarios should I use Triton?
• If I want to use basic Pin’s features with Python bindings
• If I’m working on a trace and want to perform a taint or symbolic analysis
• If I want to simplify expressions using my own rules or those of z3 6
6
https://github.com/Z3Prover/z3
29
The classic count inst example
count = 0
def mycb(inst):
global count
count += 1
def fini():
print count
if __name__ == ’__main__’:
setArchitecture(ARCH.X86_64)
startAnalysisFromEntry()
addCallback(mycb, CALLBACK.BEFORE)
addCallback(fini, CALLBACK.FINI)
runProgram()
30
Can I use the libTriton into IDA?
Figure 11: Triton and RPyC 7
7
https://rpyc.readthedocs.org 31
Can I emulate code via the libTriton into IDA?
Figure 12: Symbolic Emulation into IDA
32
Simplify expressions
Simplify expressions
• Simplification passes may be applied at different levels:
• Runtime node assignment (registers, memory cells, volatile)
• Specific isolated expressions
• Triton allows you to:
• Apply your own transformation rules based on smart patterns
• Use z3 8
to apply transformations
8
(simplify <expr>)
34
Simplification passes at different levels
Instruction semantics
Instruction semantics
Instruction semantics
Instruction semantics
Simplification passes Memory assignment
Register assignment
Volatile assignment
Trace
Figure 13: Runtime simplification
35
Simplify expressions with your own rules
Rule example: A ⊕ A → A = 0
def xor(node):
if node.getKind() == AST_NODE.BVXOR:
if node.getChilds()[0] == node.getChilds()[1]:
return bv(0, node.getBitvectorSize())
return node
if __name__ == ’__main__’:
# [...]
recordSimplificationCallback(xor)
# [...]
36
Smart patterns matching
• Commutativity and patterns matching
• A smart equality (==) operator
>>> a | >>> a | >>> a
((0x1 * 0x2) & 0xFF) | (((0x1 * 0x2) & 0xFF) ^ 0x3) | (0x1 / 0x2)
>>> b | >>> b | >>> b
((0x2 * 0x1) & 0xFF) | (0x3 ^ ((0x2 * 0x1) & 0xFF)) | (0x2 / 0x1)
>>> a == b | >>> a == b | >>> a == b
True | True | False
37
Triton to Z3 and vice versa
[...] Triton’s AST Z3’s AST Triton’s AST [...]
simplifications simplifications simplifications
Figure 14: ASTtriton ←→ ASTz3
38
Simplify expressions via z3
>>> enableSymbolicZ3Simplification(True)
>>> a = ast.variable(newSymbolicVariable(8))
>>> b = ast.bv(0x38, 8)
>>> c = ast.bv(0xde, 8)
>>> d = ast.bv(0x4f, 8)
>>> e = a * ((b & c) | d)
>>> print e
(bvmul SymVar_0 (bvor (bvand (_ bv56 8) (_ bv222 8)) (_ bv79 8)))
>>> f = simplify(e)
>>> print f
(bvmul (_ bv95 8) SymVar_0)
39
Simplify expressions via z3
Note that solvers’ simplification does not converge to a more human readable
expression.
40
Analyse opaque predicates
Analyse opaque predicates
42
Analyse opaque predicates
Figure 15: Bogus Flow Control
43
Analyse opaque predicates
∀x, y (y < 10 ∨ x(x + 1) mod 2 == 0) is True
44
Analyse opaque predicates
Convert x and y as symbolic variable;
for basic block in graph do
for instruction in basic block do
triton.emulate(instruction);
if instruction.type is conditionnal jump and zf expression is symbolized then
Check if zf has solutions ;
end
end
end
45
Analyse opaque predicates (1)
x_addr = 0x601BE0
y_addr = 0x601BDC
x_symVar = convertMemyToSymVar(Memory(x_addr, CPUSIZE.DWORD))
y_symVar = convertMemyToSymVar(Memory(y_addr, CPUSIZE.DWORD))
46
Analyse opaque predicates (2)
graph = idaapi.FlowChart(idaapi.get_func(FUNCTION_ADDRESS))
for block in graph:
if block.startEA != 0x401637:
analyse_basic_block(block)
47
Analyse opaque predicates (3)
def analyse_basic_block(BB):
pc = BB.startEA
while pc <= BB.endEA:
instruction = triton.emulate(pc)
pc = triton.getSymbolicRegisterValue(triton.REG.RIP)
if instruction.isControlFlow():
break
...
zf_expr = triton.getFullAst(zf_expr.getAst())
eq_false = ast.assert_(ast.equal(zf_expr, ast.bvfalse()))
eq_true = ast.assert_(ast.equal(zf_expr, ast.bvtrue()))
48
Analyse opaque predicates (3)
models_true = triton.getModels(eq_true, 4)
models_false = triton.getModels(eq_false, 4)
addr_next = instruction.getNextAddress()
addr_jmp = instruction.getFirstOperand().getValue()
if len(models_true) != 0: # addr_jmp is not taken
bb = get_basic_block(addr_jmp)
dead_blocks.append(bb)
if len(models_false) != 0: # addr_next is not taken
bb = get_basic_block(addr_next)
dead_blocks.append(bb)
49
Analyse opaque predicates
Figure 16: Bogus Flow Control simplified with Triton
50
Analyse opaque predicates
First demo!
51
Reconstruct a CFG from trace dif-
ferential
Reconstruct a CFG from trace differential
Problem: Given two sequences what is the minimal edition distance?
T1 : A T C T G A T
T2 : A A T C T G A T
53
Reconstruct a CFG from trace differential
Levenshtein algorithm (dynamic programming)
T1 : A - T C T G A T
T2 : A A T C T G A T
54
Reconstruct a CFG from trace differential
We can see a trace as a DNA sequence on a bigger alphabet. Many algorithms have
been developed to analyze/compare a DNA sequence and they can be used on traces.
• Levenshtein algorithm: optimal alignment, if, else detection
• Suffix Tree: Longest repeated factor, loops detection
55
Reconstruct a CFG from trace differential
int f(int x) {
int result = 0;
result = x;
result = result >> 3;
if (result % 4 == 2) {
result += 5;
result = result + x;
}
result = result * 7;
return result;
}
Figure 17: Function f
56
0x400536 push rbp
0x400537 mov rbp, rsp
0x40053a mov dword ptr [rbp - 0x14], edi
0x40053d mov dword ptr [rbp - 4], 0
0x400544 mov eax, dword ptr [rbp - 0x14]
0x400547 mov dword ptr [rbp - 4], eax
0x40054a sar dword ptr [rbp - 4], 3
0x40054e mov eax, dword ptr [rbp - 4]
0x400551 cdq
0x400552 shr edx, 0x1e
0x400555 add eax, edx
0x400557 and eax, 3
0x40055a sub eax, edx
0x40055c cmp eax, 2
0x40055f jne 0x40056b
0x400561 add dword ptr [rbp - 4], 5
0x400565 mov eax, dword ptr [rbp - 0x14]
0x400568 add dword ptr [rbp - 4], eax
0x40056b mov edx, dword ptr [rbp - 4]
0x40056e mov eax, edx
0x400570 shl eax, 3
0x400573 sub eax, edx
0x400575 mov dword ptr [rbp - 4], eax
0x400578 mov eax, dword ptr [rbp - 4]
0x40057b pop rbp
57
Recover the algorithm of a VM
Recover the algorithm of a VM
Problem: Given a very secret algorithm obfuscated with a VM. How can we recover
the algorithm without fully reversing the VM?
59
Recover the algorithm of a VM
$ ./vm 1234
3920664950602727424
$ ./vm 326423564
16724117216240346858
60
Recover the algorithm of a VM
• The VM is too big to be analyzed statically in few minutes
• One trace gives you all information that you need
61
Recover the algorithm of a VM
• Use taint analysis to isolate VM’s handlers and their goal
Figure 18: VM handler and a taint analysis
62
Recover the algorithm of a VM
Triton tool
from triton import *
def sym(instruction):
if instruction.getAddress() == 0x4099B5:
taintRegister(REG.RAX)
def before(instruction):
if instruction.isTainted():
print instruction
if __name__ == ’__main__’:
setArchitecture(ARCH.X86_64)
startAnalysisFromEntry()
addCallback(sym, CALLBACK.BEFORE_SYMPROC)
addCallback(before, CALLBACK.BEFORE)
runProgram()
Output
mov rdx, qword ptr [rbp + rax*8 - 0x330]
shr rdx, cl ; First handler, RDX = 1234
mov qword ptr [rbp + rax*8 - 0x330], rdx
mov rax, qword ptr [rbp + rax*8 - 0x330]
mov qword ptr [rdx], rax
... ; All others VM’s handlers
mov rdx, qword ptr [rax]
mov rax, qword ptr [rax]
mov ecx, eax
shl rdx, cl ; Last handler, RDX = 3920664950602727424
mov qword ptr [rbp + rax*8 - 0x330], rdx
63
Recover the algorithm of a VM
• Use symbolic execution to extract the expression of the algorithm
• Create a script input ←→ hash
64
Recover the algorithm of a VM
Triton tool
def sym(instruction):
if instruction.getAddress() == 0x4099B5:
convertRegisterToSymbolicVariable(REG.RAX)
def before(instruction):
if instruction.getAddress() == 0x409A0B:
raxAst = getFullAst(
getSymbolicExpressionFromId(
getSymbolicRegisterId(REG.RAX)
).getAst())
print ’n[+] Generating input_to_hash.py.’
fd = open(’./input_to_hash.py’, ’w’)
fd.write(TEMPLATE_GENERATE_HASH % (raxAst))
fd.close()
print ’n[+] Generating hash_to_input.py.’
fd = open(’./hash_to_input.py’, ’w’)
fd.write(TEMPLATE_GENERATE_INPUT % (raxAst))
fd.close()
Output
$ ./triton ./solve-vm.py ./vm 1234
[+] Generating input_to_hash.py.
[+] Generating hash_to_input.py.
$ python ./input_to_hash.py 1234
3920664950602727424
$ python ./input_to_hash.py 8347324
15528411515173474176
$ python ./hash_to_input.py 15528411515173474176
...
[SymVar_0 = 2095535]
[SymVar_0 = 2093487]
[SymVar_0 = 2027951]
[SymVar_0 = 2029999]
[SymVar_0 = 2060719]
[SymVar_0 = 2062767]
$ ./vm 2093487
15528411515173474176
$ ./vm 2027951
15528411515173474176
$ ./vm 2060719
15528411515173474176
65
Recover the algorithm of a VM
Second demo!
66
Conclusion
Conclusion
• Lots of static protections may be broken from an unique trace
• Taint and symbolic analysis are really useful when reversing obfuscated code
• The best protection is MBA and bitwise operation
• Hard to detect patterns automatically
• Hard to simplify
68
Future Works
Future Works
• libTriton
• Improve the emulation part
• Paths and expressions merging
• Restructured DFG/CFG via a Python representation (WIP #282 #287)
• Trace differential on DNA-based algorithms
• Pattern matching via formal proof
• Internal GC to scale the memory consumption
70
Thanks
Any Questions?
71
Contact us
• Romain Thomas
• rthomas at quarkslab com
• @rh0main
• Jonathan Salwan
• jsalwan at quarkslab com
• @JonathanSalwan
• Triton team
• triton at quarkslab com
• @qb triton
• irc: #qb triton@freenode.org
72
References I
S. Bardin and P. Herrmann.
Structural testing of executables.
In First International Conference on Software Testing, Verification, and Validation,
ICST 2008, Lillehammer, Norway, April 9-11, 2008, pages 22–31, 2008.
P. Godefroid, J. de Halleux, A. V. Nori, S. K. Rajamani, W. Schulte, N. Tillmann,
and M. Y. Levin.
Automating software testing using program analysis.
IEEE Software, 25(5):30–37, 2008.
73
References II
P. Godefroid, N. Klarlund, and K. Sen.
DART: directed automated random testing.
In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language
Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 213–223,
2005.
J. Newsome and D. X. Song.
Dynamic taint analysis for automatic detection, analysis, and
signaturegeneration of exploits on commodity software.
In Proceedings of the Network and Distributed System Security Symposium,
NDSS 2005, San Diego, California, USA, 2005.
74
References III
F. Saudel and J. Salwan.
Triton: A dynamic symbolic execution framework.
In Symposium sur la s´ecurit´e des technologies de l’information et des
communications, SSTIC, France, Rennes, June 3-5 2015, pages 31–54. SSTIC,
2015.
K. Sen, D. Marinov, and G. Agha.
CUTE: a concolic unit testing engine for C.
In Proceedings of the 10th European Software Engineering Conference held jointly
with 13th ACM SIGSOFT International Symposium on Foundations of Software
Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, pages 263–272, 2005.
75

More Related Content

What's hot

PWNの超入門 大和セキュリティ神戸 2018-03-25
PWNの超入門 大和セキュリティ神戸 2018-03-25PWNの超入門 大和セキュリティ神戸 2018-03-25
PWNの超入門 大和セキュリティ神戸 2018-03-25Isaac Mathis
 
ソースコードレビューのススメ
ソースコードレビューのススメソースコードレビューのススメ
ソースコードレビューのススメKLab Inc. / Tech
 
すごい配列楽しく学ぼう
すごい配列楽しく学ぼうすごい配列楽しく学ぼう
すごい配列楽しく学ぼうxenophobia__
 
GraphX によるグラフ分析処理の実例と入門
GraphX によるグラフ分析処理の実例と入門GraphX によるグラフ分析処理の実例と入門
GraphX によるグラフ分析処理の実例と入門鉄平 土佐
 
静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発Takuya Ueda
 
フリーでできるWebセキュリティ(burp編)
フリーでできるWebセキュリティ(burp編)フリーでできるWebセキュリティ(burp編)
フリーでできるWebセキュリティ(burp編)abend_cve_9999_0001
 
Homebrewによるソフトウェアの実装 (3)
Homebrewによるソフトウェアの実装 (3)Homebrewによるソフトウェアの実装 (3)
Homebrewによるソフトウェアの実装 (3)Yoshihiro Mizoguchi
 
初心者向けCTFのWeb分野の強化法
初心者向けCTFのWeb分野の強化法初心者向けCTFのWeb分野の強化法
初心者向けCTFのWeb分野の強化法kazkiti
 
Re永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドRe永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドMasaki Hara
 
ctfで学ぼうリバースエンジニアリング
ctfで学ぼうリバースエンジニアリングctfで学ぼうリバースエンジニアリング
ctfで学ぼうリバースエンジニアリングjunk_coken
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advancedMITSUNARI Shigeo
 
ラズパイでデバイスドライバを作ってみた。
ラズパイでデバイスドライバを作ってみた。ラズパイでデバイスドライバを作ってみた。
ラズパイでデバイスドライバを作ってみた。Kazuki Onishi
 
M5StackをRustで動かす
M5StackをRustで動かすM5StackをRustで動かす
M5StackをRustで動かすKenta IDA
 
社内のマニュアルをSphinxで作ってみた
社内のマニュアルをSphinxで作ってみた社内のマニュアルをSphinxで作ってみた
社内のマニュアルをSphinxで作ってみたIosif Takakura
 
CVE、JVN番号の取得経験者になろう!
CVE、JVN番号の取得経験者になろう!CVE、JVN番号の取得経験者になろう!
CVE、JVN番号の取得経験者になろう!kazkiti
 
暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -MITSUNARI Shigeo
 
40歳過ぎてもエンジニアでいるためにやっていること
40歳過ぎてもエンジニアでいるためにやっていること40歳過ぎてもエンジニアでいるためにやっていること
40歳過ぎてもエンジニアでいるためにやっていることonozaty
 
Mathematicaが敬遠される理由とは
Mathematicaが敬遠される理由とはMathematicaが敬遠される理由とは
Mathematicaが敬遠される理由とはHirokazu Kobayashi
 

What's hot (20)

PWNの超入門 大和セキュリティ神戸 2018-03-25
PWNの超入門 大和セキュリティ神戸 2018-03-25PWNの超入門 大和セキュリティ神戸 2018-03-25
PWNの超入門 大和セキュリティ神戸 2018-03-25
 
Kotlinミニアンチパターン
KotlinミニアンチパターンKotlinミニアンチパターン
Kotlinミニアンチパターン
 
ソースコードレビューのススメ
ソースコードレビューのススメソースコードレビューのススメ
ソースコードレビューのススメ
 
すごい配列楽しく学ぼう
すごい配列楽しく学ぼうすごい配列楽しく学ぼう
すごい配列楽しく学ぼう
 
GraphX によるグラフ分析処理の実例と入門
GraphX によるグラフ分析処理の実例と入門GraphX によるグラフ分析処理の実例と入門
GraphX によるグラフ分析処理の実例と入門
 
静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発静的解析を使った開発ツールの開発
静的解析を使った開発ツールの開発
 
イエラエセキュリティMeet up 20210820
イエラエセキュリティMeet up 20210820イエラエセキュリティMeet up 20210820
イエラエセキュリティMeet up 20210820
 
フリーでできるWebセキュリティ(burp編)
フリーでできるWebセキュリティ(burp編)フリーでできるWebセキュリティ(burp編)
フリーでできるWebセキュリティ(burp編)
 
Homebrewによるソフトウェアの実装 (3)
Homebrewによるソフトウェアの実装 (3)Homebrewによるソフトウェアの実装 (3)
Homebrewによるソフトウェアの実装 (3)
 
初心者向けCTFのWeb分野の強化法
初心者向けCTFのWeb分野の強化法初心者向けCTFのWeb分野の強化法
初心者向けCTFのWeb分野の強化法
 
Re永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドRe永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライド
 
ctfで学ぼうリバースエンジニアリング
ctfで学ぼうリバースエンジニアリングctfで学ぼうリバースエンジニアリング
ctfで学ぼうリバースエンジニアリング
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advanced
 
ラズパイでデバイスドライバを作ってみた。
ラズパイでデバイスドライバを作ってみた。ラズパイでデバイスドライバを作ってみた。
ラズパイでデバイスドライバを作ってみた。
 
M5StackをRustで動かす
M5StackをRustで動かすM5StackをRustで動かす
M5StackをRustで動かす
 
社内のマニュアルをSphinxで作ってみた
社内のマニュアルをSphinxで作ってみた社内のマニュアルをSphinxで作ってみた
社内のマニュアルをSphinxで作ってみた
 
CVE、JVN番号の取得経験者になろう!
CVE、JVN番号の取得経験者になろう!CVE、JVN番号の取得経験者になろう!
CVE、JVN番号の取得経験者になろう!
 
暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -
 
40歳過ぎてもエンジニアでいるためにやっていること
40歳過ぎてもエンジニアでいるためにやっていること40歳過ぎてもエンジニアでいるためにやっていること
40歳過ぎてもエンジニアでいるためにやっていること
 
Mathematicaが敬遠される理由とは
Mathematicaが敬遠される理由とはMathematicaが敬遠される理由とは
Mathematicaが敬遠される理由とは
 

Similar to Dynamic Binary Analysis and Obfuscated Codes

Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaWei-Bo Chen
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler designKuppusamy P
 
Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++Mohammad Shaker
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricksDevGAMM Conference
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Sperasoft
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Everybody be cool, this is a roppery!
Everybody be cool, this is a roppery!Everybody be cool, this is a roppery!
Everybody be cool, this is a roppery!zynamics GmbH
 
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT TalksMykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT TalksVadym Muliavka
 
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...Cyber Security Alliance
 
Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design LogsAk
 
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...GangSeok Lee
 
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...CODE BLUE
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackGrow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackKeitaSugiyama1
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 

Similar to Dynamic Binary Analysis and Obfuscated Codes (20)

Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON China
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
 
Memory Management with Java and C++
Memory Management with Java and C++Memory Management with Java and C++
Memory Management with Java and C++
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricks
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Everybody be cool, this is a roppery!
Everybody be cool, this is a roppery!Everybody be cool, this is a roppery!
Everybody be cool, this is a roppery!
 
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT TalksMykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
Mykhailo Zarai "Be careful when dealing with C++" at Rivne IT Talks
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
 
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
 
Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design Principal Sources of Optimization in compiler design
Principal Sources of Optimization in compiler design
 
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
 
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackGrow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM Stack
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Dynamic Binary Analysis and Obfuscated Codes

  • 1. Dynamic Binary Analysis and Obfuscated Codes How to don’t kill yourself when you reverse obfuscated codes. Jonathan Salwan and Romain Thomas St’Hack in Bordeaux, April 8, 2016 Quarkslab
  • 2. About us Romain Thomas • Security Research Engineer at Quarkslab • Working on obfuscation and software protection Jonathan Salwan • Security Research Engineer at Quarkslab • Ph.D student on Software Verification supervised by: • S´ebastien Bardin from CEA • Marie-Laure Potet from VERIMAG 2
  • 3. Roadmap of this talk 1. Obfuscation introduction 2. Dynamic Binary Analysis introduction 3. The Triton framework 4. Conclusion 5. Future works 3
  • 5. What is an obfuscation? Wikipedia: ”Obfuscation is the obscuring of intended meaning in communication, making the message confusing, willfully ambiguous, or harder to understand.” 1 1 https://en.wikipedia.org/wiki/Obfuscation 5
  • 6. Why softwares may contain obfuscated codes? • Intellectual property • DRM • Hiding secrets 6
  • 7. What kind of obfuscations may we find in modern softwares? • Opaque predicates • Control-flow flattening • Virtualization • MBA and bitwise operations • Use of uncommon instructions. 7
  • 8. Example: Opaque predicates • Objective: Create unreachable basic blocks • The constraint ¬π1 is always UNSAT • The basic block ϕ3 is never executed ϕ1 ϕ2 ϕ3 ϕ4 π1 ¬π1 Figure 1: Control Flow Graph 8
  • 9. Example: CFG flattened • Objective: Remove structured control flows • The basic block ϕ2 is now used as dispatcher • The dispatcher manages the control flow • Static analysis: hard to predict which basic block will be called next ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 Figure 2: Original CFG ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 Figure 3: Flattened CFG 9
  • 10. Example: Virtualization • Objective: Emulate the original code via a custom ISA (Instruction Set Architecture) • Example: xor R1, R2 push R1 push R2 mov eax, [esp] mov ebx, [esp - 0x4] xor eax, ebx push eax 10
  • 11. Example: Virtualization Figure 4: An example of a VM’s CFG 11
  • 12. Example: MBA and bitwise operations • Objective: Transform the normal form of an expression to a more complex one • The transformation output may also be transformed again and so one a + b = (a ∨ b) + (a ∧ b) a ∗ b = (a ∧ b) ∗ (a ∨ b) + (a ∧ ¬b) ∗ (¬a ∧ b) a ⊕ b = (a ∧ ¬b) ∨ (¬a ∧ b) a ⊕ b = ((a ∧ ¬a) ∧ (¬b ∨ ¬a)) ∧ ((a ∨ b) ∨ (¬b ∨ b)) 0 = (a ∨ b) − (a + b) + (a ∧ b) 12
  • 13. Example: Use of uncommon instructions • Objective: • Break your tools • Break your mind! • May transform classic operations using AVX and SSE Figure 5: Uncommon instructions 13
  • 14. Dynamic Binary Analysis Introduc- tion
  • 15. What is a DBA? • Dynamic Binary Analysis • Any way to analyze a binary dynamically • Most popular analysis • Dynamic information extraction • Dynamic taint analysis [4] • Dynamic symbolic execution [3, 2, 6, 1] 15
  • 16. Why use a DBA? • To get runtime values at each program point • To get the control flow for a given input • To follow the spread of a specific data 16
  • 17. What is a dynamic taint analysis? • Taint analysis is used to follow a specific information through a data flow • Cell memory • Register • The taint is spread at runtime • At each program point you are able to know what cells and registers interact with your initial value 17
  • 18. What is a dynamic symbolic execution? • A DSE is used to represent the control and the data flow of an execution into arithmetical expressions • These expressions may contain symbolic variables instead of concrete values • Using a SMT solver 23 on these expressions, we are able to determine an input for a desired state 2 https://en.wikipedia.org/wiki/Satisfiability modulo theories#SMT solvers 3 http://smtlib.cs.uiowa.edu 18
  • 19. SBA vs DBA • Static Binary Analysis • Full CFG • No concrete value • Often based on abstract analysis • Scalable • False positive • Too complicated for analyze obfuscated code • Dynamic Binary Analysis • Partial CFG (only one path at time) • Concrete values • Often based on concrete analysis • Not scalable • Less false positive • Lots of static protections may be broken 19
  • 20. Online vs offline analysis • Online analysis • Extract runtime information • Inject runtime values • Interact and modify the control flow • Good for fuzzing • Offline analysis • Store the context of each program point into a database • Apply post analysis • Display the context information using both static and dynamic paradigms • Good for reverse 20
  • 21. Offline analysis good for reverse Bin DB Dynamic Extraction Analysis API IDA Static Extraction Figure 6: Example of an offline analysis infrastructure 21
  • 22. Offline analysis and symbolic emulation • Explore more than one path using symbolic emulation from a concrete path • From one path emulate them all ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 Figure 7: Concrete execution 22
  • 23. Offline analysis and symbolic emulation • Keep both concrete and symbolic values of each symbolic variable • Use the concrete value for the emulation part and the symbolic value for expressions and models • Get the model of the new branch and restore the concrete value of the symbolic variable ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ¬π π Get model Figure 8: Symbolic emulation from a concrete path 23
  • 24. Offline analysis and symbolic emulation • Concrete and emulated paths are merged with disjunctions to get a coverage expression ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ¬π π Get model pre ∧ (ϕ3 ∨ ϕ4) ∧ post Figure 9: Disjunction of paths 24
  • 25. The Triton [5] framework
  • 26. Triton in a nutshell • Dynamic Binary Analysis Framework • x86 and x86 64 binaries analysis • Dynamic Taint Analysis • Dynamic Symbolic Execution • Partial Symbolic Emulation • Python or SMT semantics representation • Simplification passes • Python and C++ API • Tracer independent • A Pintool 4 is shipped with the project • Free and opensource 5 4 https://software.intel.com/en-us/articles/pintool/ 5 http://triton.quarkslab.com 26
  • 27. The Triton’s design Symbolic Execution Engine Taint Engine IR SMT2-Lib Semantics SMT Solver Interface SMT Optimization Passes Triton internal components Multi-Arch design Multi-Arch design Taint Engine API C++ / Python Pin Valgrind DynamoRio Unicorn DB (e.g: mysql) Example of Tracers LibTriton.so Figure 10: The Triton’s design 27
  • 28. The Triton’s design • libpintool.so • Used as tracer to give the execution context to the Triton library • Python bindings on some Pin’s features • libtriton.so • Takes as input opcodes and a potential context • Contains all engines and analysis • Python and C++ API 28
  • 29. In what scenarios should I use Triton? • If I want to use basic Pin’s features with Python bindings • If I’m working on a trace and want to perform a taint or symbolic analysis • If I want to simplify expressions using my own rules or those of z3 6 6 https://github.com/Z3Prover/z3 29
  • 30. The classic count inst example count = 0 def mycb(inst): global count count += 1 def fini(): print count if __name__ == ’__main__’: setArchitecture(ARCH.X86_64) startAnalysisFromEntry() addCallback(mycb, CALLBACK.BEFORE) addCallback(fini, CALLBACK.FINI) runProgram() 30
  • 31. Can I use the libTriton into IDA? Figure 11: Triton and RPyC 7 7 https://rpyc.readthedocs.org 31
  • 32. Can I emulate code via the libTriton into IDA? Figure 12: Symbolic Emulation into IDA 32
  • 34. Simplify expressions • Simplification passes may be applied at different levels: • Runtime node assignment (registers, memory cells, volatile) • Specific isolated expressions • Triton allows you to: • Apply your own transformation rules based on smart patterns • Use z3 8 to apply transformations 8 (simplify <expr>) 34
  • 35. Simplification passes at different levels Instruction semantics Instruction semantics Instruction semantics Instruction semantics Simplification passes Memory assignment Register assignment Volatile assignment Trace Figure 13: Runtime simplification 35
  • 36. Simplify expressions with your own rules Rule example: A ⊕ A → A = 0 def xor(node): if node.getKind() == AST_NODE.BVXOR: if node.getChilds()[0] == node.getChilds()[1]: return bv(0, node.getBitvectorSize()) return node if __name__ == ’__main__’: # [...] recordSimplificationCallback(xor) # [...] 36
  • 37. Smart patterns matching • Commutativity and patterns matching • A smart equality (==) operator >>> a | >>> a | >>> a ((0x1 * 0x2) & 0xFF) | (((0x1 * 0x2) & 0xFF) ^ 0x3) | (0x1 / 0x2) >>> b | >>> b | >>> b ((0x2 * 0x1) & 0xFF) | (0x3 ^ ((0x2 * 0x1) & 0xFF)) | (0x2 / 0x1) >>> a == b | >>> a == b | >>> a == b True | True | False 37
  • 38. Triton to Z3 and vice versa [...] Triton’s AST Z3’s AST Triton’s AST [...] simplifications simplifications simplifications Figure 14: ASTtriton ←→ ASTz3 38
  • 39. Simplify expressions via z3 >>> enableSymbolicZ3Simplification(True) >>> a = ast.variable(newSymbolicVariable(8)) >>> b = ast.bv(0x38, 8) >>> c = ast.bv(0xde, 8) >>> d = ast.bv(0x4f, 8) >>> e = a * ((b & c) | d) >>> print e (bvmul SymVar_0 (bvor (bvand (_ bv56 8) (_ bv222 8)) (_ bv79 8))) >>> f = simplify(e) >>> print f (bvmul (_ bv95 8) SymVar_0) 39
  • 40. Simplify expressions via z3 Note that solvers’ simplification does not converge to a more human readable expression. 40
  • 43. Analyse opaque predicates Figure 15: Bogus Flow Control 43
  • 44. Analyse opaque predicates ∀x, y (y < 10 ∨ x(x + 1) mod 2 == 0) is True 44
  • 45. Analyse opaque predicates Convert x and y as symbolic variable; for basic block in graph do for instruction in basic block do triton.emulate(instruction); if instruction.type is conditionnal jump and zf expression is symbolized then Check if zf has solutions ; end end end 45
  • 46. Analyse opaque predicates (1) x_addr = 0x601BE0 y_addr = 0x601BDC x_symVar = convertMemyToSymVar(Memory(x_addr, CPUSIZE.DWORD)) y_symVar = convertMemyToSymVar(Memory(y_addr, CPUSIZE.DWORD)) 46
  • 47. Analyse opaque predicates (2) graph = idaapi.FlowChart(idaapi.get_func(FUNCTION_ADDRESS)) for block in graph: if block.startEA != 0x401637: analyse_basic_block(block) 47
  • 48. Analyse opaque predicates (3) def analyse_basic_block(BB): pc = BB.startEA while pc <= BB.endEA: instruction = triton.emulate(pc) pc = triton.getSymbolicRegisterValue(triton.REG.RIP) if instruction.isControlFlow(): break ... zf_expr = triton.getFullAst(zf_expr.getAst()) eq_false = ast.assert_(ast.equal(zf_expr, ast.bvfalse())) eq_true = ast.assert_(ast.equal(zf_expr, ast.bvtrue())) 48
  • 49. Analyse opaque predicates (3) models_true = triton.getModels(eq_true, 4) models_false = triton.getModels(eq_false, 4) addr_next = instruction.getNextAddress() addr_jmp = instruction.getFirstOperand().getValue() if len(models_true) != 0: # addr_jmp is not taken bb = get_basic_block(addr_jmp) dead_blocks.append(bb) if len(models_false) != 0: # addr_next is not taken bb = get_basic_block(addr_next) dead_blocks.append(bb) 49
  • 50. Analyse opaque predicates Figure 16: Bogus Flow Control simplified with Triton 50
  • 52. Reconstruct a CFG from trace dif- ferential
  • 53. Reconstruct a CFG from trace differential Problem: Given two sequences what is the minimal edition distance? T1 : A T C T G A T T2 : A A T C T G A T 53
  • 54. Reconstruct a CFG from trace differential Levenshtein algorithm (dynamic programming) T1 : A - T C T G A T T2 : A A T C T G A T 54
  • 55. Reconstruct a CFG from trace differential We can see a trace as a DNA sequence on a bigger alphabet. Many algorithms have been developed to analyze/compare a DNA sequence and they can be used on traces. • Levenshtein algorithm: optimal alignment, if, else detection • Suffix Tree: Longest repeated factor, loops detection 55
  • 56. Reconstruct a CFG from trace differential int f(int x) { int result = 0; result = x; result = result >> 3; if (result % 4 == 2) { result += 5; result = result + x; } result = result * 7; return result; } Figure 17: Function f 56
  • 57. 0x400536 push rbp 0x400537 mov rbp, rsp 0x40053a mov dword ptr [rbp - 0x14], edi 0x40053d mov dword ptr [rbp - 4], 0 0x400544 mov eax, dword ptr [rbp - 0x14] 0x400547 mov dword ptr [rbp - 4], eax 0x40054a sar dword ptr [rbp - 4], 3 0x40054e mov eax, dword ptr [rbp - 4] 0x400551 cdq 0x400552 shr edx, 0x1e 0x400555 add eax, edx 0x400557 and eax, 3 0x40055a sub eax, edx 0x40055c cmp eax, 2 0x40055f jne 0x40056b 0x400561 add dword ptr [rbp - 4], 5 0x400565 mov eax, dword ptr [rbp - 0x14] 0x400568 add dword ptr [rbp - 4], eax 0x40056b mov edx, dword ptr [rbp - 4] 0x40056e mov eax, edx 0x400570 shl eax, 3 0x400573 sub eax, edx 0x400575 mov dword ptr [rbp - 4], eax 0x400578 mov eax, dword ptr [rbp - 4] 0x40057b pop rbp 57
  • 59. Recover the algorithm of a VM Problem: Given a very secret algorithm obfuscated with a VM. How can we recover the algorithm without fully reversing the VM? 59
  • 60. Recover the algorithm of a VM $ ./vm 1234 3920664950602727424 $ ./vm 326423564 16724117216240346858 60
  • 61. Recover the algorithm of a VM • The VM is too big to be analyzed statically in few minutes • One trace gives you all information that you need 61
  • 62. Recover the algorithm of a VM • Use taint analysis to isolate VM’s handlers and their goal Figure 18: VM handler and a taint analysis 62
  • 63. Recover the algorithm of a VM Triton tool from triton import * def sym(instruction): if instruction.getAddress() == 0x4099B5: taintRegister(REG.RAX) def before(instruction): if instruction.isTainted(): print instruction if __name__ == ’__main__’: setArchitecture(ARCH.X86_64) startAnalysisFromEntry() addCallback(sym, CALLBACK.BEFORE_SYMPROC) addCallback(before, CALLBACK.BEFORE) runProgram() Output mov rdx, qword ptr [rbp + rax*8 - 0x330] shr rdx, cl ; First handler, RDX = 1234 mov qword ptr [rbp + rax*8 - 0x330], rdx mov rax, qword ptr [rbp + rax*8 - 0x330] mov qword ptr [rdx], rax ... ; All others VM’s handlers mov rdx, qword ptr [rax] mov rax, qword ptr [rax] mov ecx, eax shl rdx, cl ; Last handler, RDX = 3920664950602727424 mov qword ptr [rbp + rax*8 - 0x330], rdx 63
  • 64. Recover the algorithm of a VM • Use symbolic execution to extract the expression of the algorithm • Create a script input ←→ hash 64
  • 65. Recover the algorithm of a VM Triton tool def sym(instruction): if instruction.getAddress() == 0x4099B5: convertRegisterToSymbolicVariable(REG.RAX) def before(instruction): if instruction.getAddress() == 0x409A0B: raxAst = getFullAst( getSymbolicExpressionFromId( getSymbolicRegisterId(REG.RAX) ).getAst()) print ’n[+] Generating input_to_hash.py.’ fd = open(’./input_to_hash.py’, ’w’) fd.write(TEMPLATE_GENERATE_HASH % (raxAst)) fd.close() print ’n[+] Generating hash_to_input.py.’ fd = open(’./hash_to_input.py’, ’w’) fd.write(TEMPLATE_GENERATE_INPUT % (raxAst)) fd.close() Output $ ./triton ./solve-vm.py ./vm 1234 [+] Generating input_to_hash.py. [+] Generating hash_to_input.py. $ python ./input_to_hash.py 1234 3920664950602727424 $ python ./input_to_hash.py 8347324 15528411515173474176 $ python ./hash_to_input.py 15528411515173474176 ... [SymVar_0 = 2095535] [SymVar_0 = 2093487] [SymVar_0 = 2027951] [SymVar_0 = 2029999] [SymVar_0 = 2060719] [SymVar_0 = 2062767] $ ./vm 2093487 15528411515173474176 $ ./vm 2027951 15528411515173474176 $ ./vm 2060719 15528411515173474176 65
  • 66. Recover the algorithm of a VM Second demo! 66
  • 68. Conclusion • Lots of static protections may be broken from an unique trace • Taint and symbolic analysis are really useful when reversing obfuscated code • The best protection is MBA and bitwise operation • Hard to detect patterns automatically • Hard to simplify 68
  • 70. Future Works • libTriton • Improve the emulation part • Paths and expressions merging • Restructured DFG/CFG via a Python representation (WIP #282 #287) • Trace differential on DNA-based algorithms • Pattern matching via formal proof • Internal GC to scale the memory consumption 70
  • 72. Contact us • Romain Thomas • rthomas at quarkslab com • @rh0main • Jonathan Salwan • jsalwan at quarkslab com • @JonathanSalwan • Triton team • triton at quarkslab com • @qb triton • irc: #qb triton@freenode.org 72
  • 73. References I S. Bardin and P. Herrmann. Structural testing of executables. In First International Conference on Software Testing, Verification, and Validation, ICST 2008, Lillehammer, Norway, April 9-11, 2008, pages 22–31, 2008. P. Godefroid, J. de Halleux, A. V. Nori, S. K. Rajamani, W. Schulte, N. Tillmann, and M. Y. Levin. Automating software testing using program analysis. IEEE Software, 25(5):30–37, 2008. 73
  • 74. References II P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 213–223, 2005. J. Newsome and D. X. Song. Dynamic taint analysis for automatic detection, analysis, and signaturegeneration of exploits on commodity software. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2005, San Diego, California, USA, 2005. 74
  • 75. References III F. Saudel and J. Salwan. Triton: A dynamic symbolic execution framework. In Symposium sur la s´ecurit´e des technologies de l’information et des communications, SSTIC, France, Rennes, June 3-5 2015, pages 31–54. SSTIC, 2015. K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, pages 263–272, 2005. 75