Ben Agre - Adding Another Level of Hell to Reverse Engineering

Adding Another Level of Hell to Reverse Engineering ORStatic Binary Obfuscation using Opaque Predicates and Semi-Junk Code Ben Agre (@sboxkid) MIT Raytheon SI

Who am I Ben Agre Reverse Engineer Worked random places Currently work for Raytheon SI Done Random things Kind of an asshole Currently a student at MIT

Obligatory term slide SDLC Sandbox APT Cyber Pompeii Cyber Eyjafjallajökull(Credit to Jon Oberheide) Stuxnet

Overview Introduction to X86 Overview of current packers Overview of current ways to beat packers Why this is different/why I’m an asshole

Assumptions We assume that it is 32 bit x86 assembly This can be extended and would work better with 64 bits, but was originally written for 32 All items are assumed to be cdecl calling convention I don’t like my friends, that’s why I built this tool

X86 Assembly I apologize to those of you who know assembly this is going to be review at best, and boring to tears at worst This is a non aligned language, hence the order which bytes appear matter The smallest instruction is one byte, the largest is 15, anything past that will throw a #UD exception

Eflags Eflags is essentially the status register It contains 32 bits and can be broken down into certain items that are used for conditional jumps Important flags ZF=Zero Flag SF= Sign Flag OF= Overflow flag CF= Carry Flag

Basics Mov r1,r2/imm1 Move register or immediate r2, into r1 Add sub r1,r2 Does the operation to the first register, and stores it in r1 Modify Eflags appropriately Xorr1,r2 eXclusive OR r1 and r2, and store result in r1 Modify eflags appropriately Jmp Jump to a chunk of code

More Commands imul, idiv Unsigned multiply and divide Effect eax:edx, and change appropriate flags Call addr Call A function

Conditional Jumps JS JE JG JLE JZ Jump if zero flag JNZ Jump if zero flag is not set These all jump on state of eflags

Now that were out of Narnia, let’s shake it up Packers were originally trying to make executable’s smaller They are now used to be an ass to reverse engineers People have their favorites

General Packer Magic Mangle the IAT Make it so on each outside function call it’s hard to figure out where things are going Do some operation to all data Uncompress it Usually add some anti debugging magic Armadillo parent child debugging Themida, anything it can think of

Current direction Currently there is a large push towards making virtual machines This approach leads to closer generic defeats, one learns the language and deals with it Tracing is a pain

ASProtect Some opaque predicates Creates stack madness Virtualizes many things

Themida “state of the art” Uses highly virtualized systems Locks the binary in everyway it can be Cisc architecture Hates VM’s

Both have been kicked badly Themida has the full VM reversed by a pair of Chinese hackers Apparently modified CISC architecture or RISC for older versions Softworm did amazing things in this respect ASProtect Thousand tutorials on how to beat it These systems make high initial bar to entry but not continued protection

Destroying Them There is currently a pair of IDA modules for themida decompiling being sold on the black market This shows how broken this model can be at times Packing for all intensive purposes is deterministic Not IND-CCA secure

Terms This seems random but is important Functionally isomorphic Two functions that do the same thing but look different State isomorphic Two states that do the same thing, but look different Opaque Predicate A question which you know the answer to before you ask it If a term doesn’t make sense ask

Let’s create a way that is different Instead of virtualizing the entire system lets stick in x86 Instead of making one high bar of entry, lets play against the tools We can actually modify these binaries to the point at where they won’t look the same Example

Previous work Kenshoto MathIsHard Binary is public, packer is not Does more function rearranging, than function obfuscation Some packers employ basic junk code, but it’s always actual junk We use semi-Junk

What this is It’s a packer which is state aware and uses that to its advantage It adds little pieces of assembly to be executed Also adds items from /dev/urandom in order to mess up instruction alignment Non-Deterministic Always executes no matter how things change on the OS

Why you care Since it’s a bit different then the normal way Instead of creating a high startup cost we create a continued use cost It’s still straight x86 assembly no matter what It uses the junk so it’s hard to determine real from fake codes

Mode of operation I take some function or group of functions, from a fully compiled binary, lets call the function A I take A and I reassemble it into A’ A’ is functionally isomorphic to A However, A’ can look nothing like A Opaque predicates are added, as well as the random bytes Original function is noppedout Functions become longer and have to get rewritten to the end of the program Call Indirection added

Objectives Create a non deterministic obfuscator Make IDA DIAF Make a semi extensible intermediate representation of the assembly Make my friends hate me ??? Profit on the tears of my friends ?

Why This is different Randomization In cryptography to make it harder for an adversary you randomize you’re plaintext, making it plaintext aware What this means I can pass in a binary twice and get two completely different results

Design Decisions There are two separate ways we analyze the program Previous state engine Analyze the program, look for opaque predicates xoreax,eax is awesome for this Created state engine AKA Dynamic state engine Can modify elements, and will use them until they change

Call indirection So in our dynamic engine at times we have to fix things up We also may not want to actually place function addresses for calls IDA uses these to recursively find functions

What is a call Call 0xdeadb33f Push eip Jmp 0xdeadb33f What could a call be Push eip Push 0xdeadb33f retn

Now how do we rewrite this with stubs F(retnOffset, callAddress) Switch(retnOffset) Case x: Ret = retnOffset[x] Push ret Push callAddress return Each stub is essentially a mini function with a switch table We pregenerate a lookup table (retnOffset) Based on value push the parent return address Then push address of function to call Return This calls callAdress and will then return to parent function bypassing stub on return

Other debated way to do this Short call that pushes eip Push function to go to Retn Issue with this is that call is easy to find

A third way Push value to jmp to, either offset or address Do essentially xchg [esp+4],[esp] Retn Else do something like Pop eax Jmpeax

Finding opaque predicates Some actions have definitive outcomes before they are ever used Xor r1,r1 Sub r2,r2 These will always set eflags in one specific way, or throw an exception

However these are not the only predicates JZ If the jump is taken we know that the zero flag is set Else it’s not Hence we can reason below it Add a JNZ, and then throw in some junk We know that the jump will be taken, a valid code path followed and our junk will still mess up IDA

Still too easy JZ then JNZ is fairly easy to spot Well we could add some do nothing instructions if we wanted If we know that after the item is used, there is nothing pertaining to EAX, until a moveax, [edx], we can throw in some instructions Add eax,ecx Xoreax, eax These do not change the flow of the program, yet still make RE harder Creates an isomorphic state

Adding little stubs So now that we have some instructions we can throw, we can actually make little sub funcs essentially We do some calculation with eax, push it onto the stack and since we controlled the last few things we did, undo it

Looks kinda like JNZ(Program logic) Inceax( makes eax not zero, compare and jump left out due to space restraint) Add eax,edx(edx can be whatever, we don’t care) Push eax Moveax,[esp+88] JNZ our code After JNZ, random bytes Pop eax Their code Before any item using eax, overwrites eax

Well so we’re still now pretty easy Lets bend the program to our will Dynamic state isomorphisms Calling conventions are awesome CDECL means that the program makes some assumptions on function calls EBX stays static However, on call, there are no assumptions about eax,ecx,edx. Means we can mess with these before and after the program executes, except eax after

Now we’re getting somewhere We can change items before and after the code executes. We can also do things like change items in the middle of execution So if we do some items where we know how it will modify eflags, and then change a bit later without being used Xoreax,eax We can add a jump that goes where we want, and just add junk afterwards

Now why is this Semi-Junk Since we can fix items up inside of this random little stubs If we fix things up inside of these little stubs, then when people look for completely dead code removal it won’t be flagged It also means that during execution a trace will get a lot of chaffe from our items. Hard to distinguish differences between our code and program code

We’re not deterministic There are a lot of things that make this nondeterministic Our semi junk can look one of many separate indeterminent forms Our prologue junk can be as long as we want and can redo or undo anything in a short or long version

Hence Things look different every time we ever do packing This means that each time that a person wants to fix it up, they need to redo the entire process by hand If we rearrange functions, and then do reapply the packer, then the RE has to do it all again from scratch

Other Features Not Discussed Max length of basic blocks No more than lets say 5 lines can appear together, this is just a parameter Tunable parameters for semi junk code Hence one can have the preambles be short or long Also can tell it to prefer registers

Future Work Add other architectures Move from nasm to my own assembler Yet to be built Maybe add some anti debugging foo just for lulz

Added bonus FLIRT Flirt is based on signatures of functions Heavily relies on prologues, hence if we randomize the prologues FLIRT no longer picks up the signatures Makes static Binaries so much worse then the amount that they already suck

Field tests Two groups, 2 Highly skilled, 1 skilled, 1 novice in each group One group got the program before packing One got the program after packing Calculated sum of a fibonacci sequence with memory, using two arrays, non trivial but not hardest Also had some other random functions to mess with them Dropped privileges, changed prologues some other red herrings

Results Without packing Around half an hour With Around 9 Novice gave up

Tool Design This tool is based on vtrace Thank you kenshoto Uses nasm for assembling the instructions required Functions are rewritten at the end of the program, will add pages if necessary

Tool Release This tool will most likely be released in the next month after finals I added a feature three weeks ago and it borked so many things Based on vtrace, so one must download it seperately I’ll probably tweet it or something

Thanks For helping me design and build Thing1 Design d4s, Visi, Psifertex, Metr0, Nitrik For just being epic Draugr Raid Gynophage Bliss Hates Irony Kenshoto Prof Zeldovichand Rivest Both of whom’s classes were awesome The busticati—forever busticating The NY Crew- whom are too many to name And all not enumerated herein

Release Addendum Will probably be released after my finals, so around May 28th I will most likely announce via twitter, @sboxkid Email me at bagre@mit.edu if you want to know anything else.

Ben Agre - Adding Another Level of Hell to Reverse Engineering

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (17)

Similar to Ben Agre - Adding Another Level of Hell to Reverse Engineering

Similar to Ben Agre - Adding Another Level of Hell to Reverse Engineering (20)

More from Source Conference

More from Source Conference (20)

Recently uploaded

Recently uploaded (20)

Ben Agre - Adding Another Level of Hell to Reverse Engineering