19. Binja API● Python, C and C++ API (いかにもありそう
な!)
● LLVMに組み込まれたいくつかの解析機能がなくなっている
(たとえば、統合されたCFG走査、Uses、SSA、変数の特徴を伴
う記録)
● 分岐:関数の終端(出口)を伴う基本ブロック
● 記録状態を得る、ある簡単な範囲の解析
● api.binary.ninja/search.html
32. アジェンダ
1) IDA は完璧ではない
2) Binary Ninja IL
3) 実践的(学術的)&プログラム解析
a) 抽象解釈
4) Binary Ninjaプラグインのデモ
5) 結論
Editor's Notes
This talk isn’t about a new fantastical analysis platform. This talk isn’t about how one tool is better than another. This talk isn’t about a new silver bullet.
This talk is about making simple and advanced static analysis techniques easy and available to everyone...
Joern - source code analyzer
There’s a lot available, but we all know we’re going to ignore all of them and go straight for IDA. Why? Because IDA is interactive and tweakable and customizable.
Let’s face it...IDA isn’t perfect.
I’m sure most of you have taken a shot at doing some automated analysis in IDA. Maybe you wanted to identify all the dynamically bounded memcpys. IDA has a python API, how hard could it be?
Okay, let’s start by getting all the cross references to memcpy. Easy enough in the IDA API, we just iterate over the xrefs.
Now, we need to see if the size parameter of memcpy is constant. So we look up the calling convention of our architecture and look up the 3rd parameter. Our architecture is x86-32, so that means we need model the stack. So now we jump back to the top of the basic block and start implementing instructions. Let’s start by implementing the pushes...oh wait, then we need to do the moves...but now we need to remember that ESP *and* EBP are stack pointers...etc etc.
That’s a lot of work for such a simple analysis. There has to be a better way.
Cannot reason. Mcsema is not really that great
``class LowLevelILInstruction`` Low Level Intermediate Language Instructions are infinite length tree-based
instructions. Tree-based instructions use infix notation with the left hand operand being the destination operand.
Infix notation is thus more natural to read than other notations (e.g. x86 ``mov eax, 0`` vs. LLIL ``eax = 0``).
``class LowLevelILInstruction`` Low Level Intermediate Language Instructions are infinite length tree-based
instructions. Tree-based instructions use infix notation with the left hand operand being the destination operand.
Infix notation is thus more natural to read than other notations (e.g. x86 ``mov eax, 0`` vs. LLIL ``eax = 0``).
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
Memcpy example with binary ninja here
https://gist.github.com/withzombies/75d12d8fa1237213beb7e82acbfc3b40
http://santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
In one sense, every analysis based on abstract interpretation is a “predicate abstraction.” But the “logic” is weak — it supports conjunction (u) but not necessarily disjunction (t).
https://cs.au.dk/~amoeller/spa/spa.pdf
Here, the analysis could conclude that a and b are positive numbers in all possible executions at the end of the program. The sign of c is either positive or negative depending on the concrete execution, so the analysis must report ? for that variable. Altogether we have an abstract domain consisting of the five abstract values {+, -, 0, ?, ⊥}, which we can organize as follows with the least precise information at the top and the most precise information at the bottom: ? + 0 − The ordering reflects the fact that ⊥ represents the empty set of integer values and ? represents the set of all integer values. This abstract domain is an example of a lattice. We continue the development of the sign analysis in Section 5.2, but we first need the mathematical foundation in place.
https://cs.au.dk/~amoeller/spa/spa.pdf
Here, the analysis could conclude that a and b are positive numbers in all possible executions at the end of the program. The sign of c is either positive or negative depending on the concrete execution, so the analysis must report ? for that variable. Altogether we have an abstract domain consisting of the five abstract values {+, -, 0, ?, ⊥}, which we can organize as follows with the least precise information at the top and the most precise information at the bottom: ? + 0 − The ordering reflects the fact that ⊥ represents the empty set of integer values and ? represents the set of all integer values. This abstract domain is an example of a lattice. We continue the development of the sign analysis in Section 5.2, but we first need the mathematical foundation in place.