  1. Halvar Flake ( Sebastian Porst (
  2. He has a problem Huge Disassembled Binary Statically linked library he is not aware of
  3. If he only knew ... EScript.api (Adobe Reader JavaScript Engine) libjs (Spider Monkey) Open-Source JavaScript library
  4. He has a problem Present Guessing strings FLIRT signatures Future BinCrowd
  6. Advantages Disassembly vs. Source Notice vulnerable code Find other uses Question: What else uses this vulnerable function? Answer: [ List of Programs ] AcroForm.api Vulnerable libtiff
  7. Technical Intermezzo A: Finding Functions
  8. We need fast lookup! Compilers screw with us … For now we have three ways ... So how does one store functions? Random register assignment Reordering instructions Switching mnemonics
  9. Small Prime Product • Positive 64-bit integer number • Characteristic for a function • Small prime for each mnemonic • Multiply • Two functions are considered equal if they have the same list of mnemonics – Order of mnemonics is ignored • Match quality: High
  10. MD-Index • Structural lookup in a database would be great • Erm … but a graph is not a number • We want a hash function for graphs!
  11. MD-Index 80-Bit Hash Value Result: Fast DB lookup for particular functions
  12. MD-Index • Take every edge in the graph • For every edge, construct 5-tuple: – # of incoming edges in the source – # of outgoing edges in the source – # of incoming edges in the target – # of outgoing edges in the target – Topological order of the edge in the graph • So a graph gives us a set of vectors
  13. MD-Index • A set of vectors is not exactly a number • Embed each vector into the reals: – Map to – It’s a 5-dimensional vector space over Q – Each element is also “just” a number • Use • Now mix all the results:
  14. MD-Index 80-Bit Hash Value Result: Fast DB lookup for particular functions
  15. MD-Index with calls • Just the flowgraph is too false-positive prone • Encode the call positions, too • Result: Hash function for flowgraph with calls at particular locations
  16. 3-tiered lookup • Does the prime product match? – If yes, high confidence in correct match • Does the MD-Index with calls match? – If yes, medium confidence in correct match • Does the MD-Index without calls match? – If yes, low confidence in correct match
  17. Problems • Comparison process is not very robust to changes in flow graphs – BinDiff can do a lot more – For most uses sufficient • Comparison does not work for tiny functions – Where tiny means less than 8 edges – Context is not considered
  18. She has a problem Dozens of previously analyzed rootkits New suspicious file
  19. If she only knew She came across that malware author two years ago He reused his rootkit hiding code and she documented it back then
  21. Advantages Remember the past Import earlier results Simplify the future
  22. Technical Intermezzo B: Calculating scores for files
  23. How to find similar files? Remember fuzzyness Here is what we do ... So we have this database, but ... One file typically contains several different statically linked and dynamically imported libraries
  24. Calculating a file score • Calculate a score that depends on the number of matches weighted by their quality • The higher the score, the more significant functions are shared by two files
  25. Problems • We are still working on score calculation • Desired score depends on goal – Comment porting, library identification, ...
  26. They have a problem Complex team with different sub-teams Information flow restricted by clearance levels
  27. If they only knew ... BinCrowd manages different access levels in a centralized way No data transfer from high clearance people to low clearance people
  28. They have another problem Different members use different tools Making new information available to other members is difficult
  29. If they only knew ... BinCrowd makes it easy to exchange information between different tools Individual members can use whatever tools they want
  31. Advantages Central database of knowledge Controlled transfer of information Synchronize information from different tools
  32. Technical Intermezzo C: Use BinCrowd
  33. How do you actually access it? We host a free community server Here is what you need ... So we have this database, but ... We have a prepopulated database where you can download and upload information.
  34. Software you need • IDA Pro 5.6 • IDAPython 1.3.2 • A BinCrowd account (free) • The BinCrowd IDA Pro Plugin –
  35. Usage • Register BinCrowd account • Download the BinCrowd IDA Plugin • Load BinCrowd IDA Plugin using ALT-9 in IDA • Read the readme.txt file to find out what CTRL-1, CTRL-2, CTRL-3, and CTRL-4 do
  36. New IDB Download prior results Analyze IDB Upload new Results Workflow
  37. Best practices • Name your input files like program.version.compiler.optimization_level.x xx
  38. A fair warning • Passwords are transmitted in plain-text • Database will be reset randomly during beta – All data will be lost, accounts will be kept • Cross-site request forgeries are a dime a dozen
  39. Credits and Thanks • Nathan Fain – For getting the first version of BinCrowd off the ground
  40. Credits and Thanks • Christian Ketterer – For designing the web interface • American Greetings – Thanks in advance for not suing us over our liberal use of care bears when you guys find this presentation
  41. BinCrowd can be used for free! Give it a try at