Advantages
Disassembly vs. Source Notice vulnerable code Find other uses
Question: What else uses
this vulnerable function?
Answer: [ List of Programs ]
AcroForm.api
Vulnerable libtiff
We need fast lookup!
Compilers screw with us …
For now we have three ways ...
So how does one store functions?
Random register assignment
Reordering instructions
Switching mnemonics
Small Prime Product
• Positive 64-bit integer number
• Characteristic for a function
• Small prime for each mnemonic
• Multiply
• Two functions are considered equal if they
have the same list of mnemonics
– Order of mnemonics is ignored
• Match quality: High
MD-Index
• Structural lookup in a database would be great
• Erm … but a graph is not a number
• We want a hash function for graphs!
MD-Index
• Take every edge in the graph
• For every edge, construct 5-tuple:
– # of incoming edges in the source
– # of outgoing edges in the source
– # of incoming edges in the target
– # of outgoing edges in the target
– Topological order of the edge in the graph
• So a graph gives us a set of vectors
MD-Index
• A set of vectors is not exactly a number
• Embed each vector into the reals:
– Map to
– It’s a 5-dimensional vector space over Q
– Each element is also “just” a number
• Use
• Now mix all the results:
MD-Index with calls
• Just the flowgraph is too false-positive prone
• Encode the call positions, too
• Result: Hash function for flowgraph with calls
at particular locations
3-tiered lookup
• Does the prime product match?
– If yes, high confidence in correct match
• Does the MD-Index with calls match?
– If yes, medium confidence in correct match
• Does the MD-Index without calls match?
– If yes, low confidence in correct match
Problems
• Comparison process is not very robust to
changes in flow graphs
– BinDiff can do a lot more
– For most uses sufficient
• Comparison does not work for tiny functions
– Where tiny means less than 8 edges
– Context is not considered
She has a problem
Dozens of
previously
analyzed rootkits
New suspicious file
If she only knew
She came across
that malware
author two years
ago
He reused his rootkit hiding
code and she documented it
back then
How to find similar files?
Remember fuzzyness
Here is what we do ...
So we have this database, but ...
One file typically contains
several different statically linked
and dynamically imported
libraries
Calculating a file score
• Calculate a score that depends on the number
of matches weighted by their quality
• The higher the score, the more significant
functions are shared by two files
Problems
• We are still working on score calculation
• Desired score depends on goal
– Comment porting, library identification, ...
They have a problem
Complex team
with different
sub-teams
Information flow restricted
by clearance levels
If they only knew ...
BinCrowd manages
different access
levels in a
centralized way
No data transfer from high clearance
people to low clearance people
They have another problem
Different
members use
different tools
Making new information
available to other members
is difficult
If they only knew ...
BinCrowd makes it
easy to exchange
information
between different
tools
Individual members can use
whatever tools they want
How do you actually access it?
We host a free community server
Here is what you need ...
So we have this database, but ...
We have a prepopulated database
where you can download and
upload information.
Software you need
• IDA Pro 5.6
• IDAPython 1.3.2
• A BinCrowd account (free)
• The BinCrowd IDA Pro Plugin
– http://github.com/zynamics
Usage
• Register BinCrowd account
• Download the BinCrowd IDA Plugin
• Load BinCrowd IDA Plugin using ALT-9 in IDA
• Read the readme.txt file to find out what
CTRL-1, CTRL-2, CTRL-3, and CTRL-4 do
Best practices
• Name your input files like
program.version.compiler.optimization_level.x
xx
A fair warning
• Passwords are transmitted in plain-text
• Database will be reset randomly during beta
– All data will be lost, accounts will be kept
• Cross-site request forgeries are a dime a dozen
Credits and Thanks
• Nathan Fain
– For getting the first version of BinCrowd off the
ground
Credits and Thanks
• Christian Ketterer
– For designing the web interface
• American Greetings
– Thanks in advance for not suing us over our liberal
use of care bears when you guys find this
presentation
BinCrowd can be used for free!
Give it a try at
http://bincrowd.zynamics.com