IVIZ TECHNO SOLUTIONS PVT. LTD.                                Puncture Automatic Program Analysis using Dynamic Binary In...
AbstractSome of the major challenges in Software Security Research include VulnerabilityIdentification/Discovery, Vulnerab...
Binary InstrumentationInstrumentation is a technique of inserting extra code into an application to observe itsbehaviour. ...
execution. Instrumentation happens on only and all instructions that are ever executed. Pincan even instrument self-modify...
system call has a wrapper function in ntdll.dll system library that loads the system callnumber and invokes the system cal...
PunctureThis section describes a small subset of the functions made available to PinTool writersthrough Pin API in the con...
“Fini” callback routine is registered using PIN_AddFiniFunction(fn, VOID* v) which is calledafter the analysis is finished...
void Image_WS2_32(IMG img, void *v)  {    RTN rtn;    PROTO proto;       for(SYM sym = IMG_RegsymHead(img); SYM_Valid(sym)...
It is very common to call the original routine from analysis routine. This can be done usingPIN_CallApplicationFunction as...
Challenges and LimitationsFirst challenge we have encountered with Pin was control on I/O. In instrumentation,console I/O ...
ConclusionsAlthough Dynamic Binary Instrumentation tools like Pin are developed primarily foranalysing behaviour of progra...
Upcoming SlideShare
Loading in...5
×

nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation

3,435

Published on

Automatic Program Analysis using Dynamic Binary Instrumentation by Sunil Kumar

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,435
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
63
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation"

  1. 1. IVIZ TECHNO SOLUTIONS PVT. LTD. Puncture Automatic Program Analysis using Dynamic Binary Instrumentation Sunil Kumar sunil.kumar@ivizsecurity.com 2/14/2011Dynamic Binary Instrumentation involves execution of a given program in a controlled environmentsometimes over a VM while tracing its runtime context and analyzing the program behavior byintroduction of custom instrumentation code at various point during the lifetime of the program. Inthis paper we use PIN, a heavyweight instrumentation framework developed by Intel in order toperform behavior analysis of binary programs automatically at runtime.
  2. 2. AbstractSome of the major challenges in Software Security Research include VulnerabilityIdentification/Discovery, Vulnerability Analysis, Exploit Development and Malicious SoftwareAnalysis. Identification of vulnerability in software requires knowledge of unintended orweakly coded parts of the software. One possible way to identify them is see the use offunctions with well-known bugs like “strcpy”. In exploit development, one need to knowwhat input triggers the bug and how was it passed to the program. A malware is a piece ofcode which performs unwanted behaviour when executes. During analysis it is veryimportant to know what this “unwanted behaviour” is. In this paper we have attempt toaddress these challenges using PIN, a Dynamic Binary Instrumentation (DBI) enginedeveloped by Intel Corporation. Although PIN supports many platforms, our discussion willbe mostly in context of Windows environment. Finally we introduce a custom tool which wedeveloped mainly for the purpose of learning and understanding the internals of PIN thatcan perform automatic behaviour analysis of programs.IntroductionUsing debugger is one of the commonly used techniques for dynamic program analysis.Analysts attach debuggers to programs and set breakpoints at various addresses to identifywhich functions with what parameters are called to perform required tasks. This techniqueis used in vulnerability identification to identify usage of known vulnerable functions and theparameters. Exploit writers use this technique to identify the actual input that triggered thebug and the source of input by analysing runtime memory dumps. One problem withdebuggers is that most of the time they use well known APIs to function and malwarewriters use anti-debug techniques to make debugging very difficult. This paper suggests PIN,a Dynamic Binary Instrumentation Engine for performing analysis of programs as analternative to debuggers. PIN does not use techniques used by debuggers like settingbreakpoints etc. so is capable of circumventing most anti-debug techniques. We developeda PinTool called “Puncture” to records all the activities performed with Windows registry,files, and network connections. Pin APIs are explained in the context of Puncture for betterunderstanding.
  3. 3. Binary InstrumentationInstrumentation is a technique of inserting extra code into an application to observe itsbehaviour. Instrumentation can be performed at various stages: at source code level,compile time, post link time or at run time.Binary Instrumentation is a way of analysing behaviour of a program by inserting extra codeat certain places in the program at runtime. It is very useful where source code is notavailable and one cannot insert extra lines and recompile it. A typical example is MicrosoftWindows Platform where source code is typically not available and kernel interface cannotbe adopted to support observability.Binary instrumentation created a new version of binary by inserting instrumentation code init. For example, the binary can be instrumented to insert code before every instruction withmemory reference to simulate and control cache and memory operations.With the features available with binary instrumentation, it is possible to do complete systememulation by providing custom system call interfaces, system and user binaries, devices etc.to provide a sandbox like environment to the binary in question. This makes the analysis ofmalwares possible without compromising the real host system.PINPin is a Dynamic Binary Instrumentation Engine developed by Intel Corporation. Pin is basedon post-link optimizer “Spike”. Pin can perform software instrumentation of Windows, Linuxand MacOS platforms on 32bit or 64bit architecture. Pin is the underlying infrastructure forcommercial products like Intel Parallel Studio tools. Pin is provided free of charge from Intelat http://www.pintool.org.[1]Pin performs the instrumentation by running unmodified application in a process-levelvirtual machine [1]. Pin intercepts the execution of application at first instruction and insertsthe instrumentation code as and when required. The application with insertedinstrumentation code is cached for subsequent executions as well to avoid instrumentationoverhead. Unlike DLL injection used in exploit development, no new thread is created to runthe code of PinVM or PinTool. They are executed by existing application threads only.However PinTool can create new threads if required.Pin provides a C/C++ API to write custom instrumentation code known as PinTools in formlibraries (DLL files on Windows) and can be built by most common compilers. PinToolsusually have two kinds of routines: Instrumentation Routines and Analysis Routines.An instrumentation routine identifies the point or conditions where instrumentation codeneeds to be inserted and a pointer to the analysis routine. Instrumentation routines areexecuted once in lifecycle of process and define “when” a PinTool should gain the control of
  4. 4. execution. Instrumentation happens on only and all instructions that are ever executed. Pincan even instrument self-modifying-code because instructions are instrumented in justbefore they executed (Just-In-Time mode).An analysis routine is the piece of code which is executed when the specific condition orpoint is hit during execution of program. These routines are executed whenever the “when”is triggered. It defines “what” to do when PinTool gains execution control.(Img1: Workflow of Pin on Windows Platform [1].)The execution of Pin begins with the launcher process (pin.exe) which injects Pin VMM(Virtual Machine Monitor) (pinvm.dll) and pin-tool.dll in application’s address space.Pin keeps the control of execution by copying application and instrumentation code tosoftware code cache and rewriting braches so that control remain in the cache. The programis always executed from the cache and original program is kept for reference.As a dynamic instrumentation system and to be useful in behaviour analysis of programs Pinprovides as much observability as it can, yet providing enough isolation so that actualbehaviour of the program is unchanged. It notifies Thread/Process Creation/Destruction,Library Load/Unload.As a process level VM, Pin has full control on everything executed in User space but losescontrol in kernel mode. To manage the execution of system call and regain the control afterreturning from kernel mode, Pin monitors some of the system calls. Every Pin monitored
  5. 5. system call has a wrapper function in ntdll.dll system library that loads the system callnumber and invokes the system call. Pin captures the system call number and arguments byattaching debugger to a dummy process and single stepping through monitored system callswrapper functions [1]. It is not possible to monitor all the system calls because many systemcalls are undocumented feature on Windows and there is not always a one-to-one mappingof wrapper functions. To handle this situation Pin implements a “System Gate” to interceptthe system calls and switches to VMM when an ‘int 2e’ or ‘sysenter’ instruction on 32bitplatform or ‘syscall’ on 64bit architecture is encountered [1].Pin provides a debugging interface also where one can attach debugger of choice to debugthe running process under Pin. Extending the features of debugger is also available throughDebugAPI.Pin Instrumentation API:Pin provides two modes of Instrumentation: JIT (Just In Time) Mode and Probe Mode.In JIT mode the instrumented application’s code and instrumentation code is generated andcached in the software cache for execution. This provides more control over the executionbecause code is generated by Pin-VM. JIT is the preferred mode of Instrumentation.In Probe mode the instrumented binary is modified in place. Because the code is not copiedto code cache, the instrumentation is a bit faster with the cost of losing some functionalityand granularity is limited to Routine level.Five levels of instrumentation granularities are provided by Pin: 1. INS (Instruction Level):-- Instruction is the unit of execution that can be addressed individually on given platform. 2. BBL (Basic Block Level):-- Basic Block is a set of Instructions start with one entry point and ends on first control transfer instruction [2]. 3. Trace (Trace Level):-- Trace is a sequence of continuous instruction with one entry point [2]. A trace starts usually from a target of a jump and ends at an unconditional jump instruction. 4. RTN (Routine Level):-- Routine level instrumentation allows instrumentation of methods or functions in defined in the application or its dependencies. This is achieved by utilizing the symbol information available in export section and in external debug symbol (.pdb) files. 5. IMG (Image Level):-- Image level instrumentation allows handling load/unload events for Images linked to the application and navigating sections of images loaded.
  6. 6. PunctureThis section describes a small subset of the functions made available to PinTool writersthrough Pin API in the context of a PinTool named ‘Puncture’ created by us to log activitiesperformed by the application.On Windows system a fairly good picture of the behaviour of a given application can bedeveloped by monitoring its interaction with file system, registry, other processes and thenetwork [8]. To log all these activities we created 3 modules to wrap commonly usedfunctions of following APIs:  RegistryAPI  FIleAPI  NetworkAPIDetails will be discussed later in this section.As discussed earlier, PinTools are basically libraries linked dynamically to application i.e. aDLL file on Windows. All the PinTools are must export their “main” function. So C code of aminimal PinTool that does not perform any instrumentation is listed below: #include<pin.H> int main(int argc, char * argv[]) { if(PIN_Init(argc,argv)) return -1; PIN_StartProgram(); return 0; }PIN_Init() initializes the instrumentation engine and passes the initial arguments, one ofthem is the application name. PIN_StartProgram() start the actual execution of applicationand never returns. Hence all instrumentation tasks are performed before callingPIN_StartProgram.If symbol information is required as in the case of Routine level instrumentation,PIN_InitSymbols() is called even before PIN_Init() to initialize Symbol support. Symbols areretrieved from standard symbol locations. Pin uses DBGHELP.DLL to provide symbol supportand perhaps is the only external dependency.Most of the instrumentation routines are actually callback routines called on specific events,for example to perform the cleanup tasks like closing log files, network connections etc. a
  7. 7. “Fini” callback routine is registered using PIN_AddFiniFunction(fn, VOID* v) which is calledafter the analysis is finished.In order to capture the arguments passed and their corresponding return values to functioncalled by application and to be able to log them, we used two approaches:  Replace the old signature of functions with custom signatures.  Register callback routines just before function starts and function returns.All routine level instrumentations are performed when Image that contains the routine isloaded. A callback for image load is registered by calling IMG_AddInstrumentFunction (IMGimg, VOID *v) where parameter ‘img’ is the object representing the loaded image in memoryand “v” is pointer to an optional user defined argument passed when it was called.When Image is loaded we can get the name/path of image by calling IMG_Name(img) asstd::string object. Once we have identified the right image for instrumentation by comparingnames, we iterate over symbols in image to identify the routines we required to instrument.Names retrieved from symbols may not exactly match name of the routines we need toinstrument because of name-mangling of overloaded functions by compiler to keep themunique. To handle name mangling, Pin provides PIN_UndecorateSymbolName to un-manglethe names. Once we have identified the name, we obtain RTN object of the routine usingRTN_FindByAddress (IMG_LowAddress(img) + SYM_Value(sym)). SYM_Value returns theoffset of routine from Image Base Address i.e. IMG_LowAddress.Following code listing is part of the pintool to replace the signature of “socket” functionfrom ws2_32.dll. int main(int argc, char *argv[]) { ... IMG_AddInstrumentFunction(Image, 0); PIN_AddFiniFunction(Fini,0); PIN_StartProgram(); return 0; } void Image(IMG img, void *v) { const char *lpImageName = StripPath(IMG_Name(img).c_str()); //Instrument Registry API if(!_strnicmp(lpImageName, "ADVAPI32.DLL",15)) Image_WS2_32(img,v); ... }
  8. 8. void Image_WS2_32(IMG img, void *v) { RTN rtn; PROTO proto; for(SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string sUndecFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if("socket" == sUndecFuncName) { rtn = RTN_FindByAddress(IMG_LowAddress(img)+SYM_Value(sym)); if(RTN_Valid(rtn)) { proto = PROTO_Allocate(PIN_PARG(WINDOWS::SOCKET), CALLINGSTD_STDCALL, "socket", PIN_PARG(int), PIN_PARG(int), PIN_PARG(int), PIN_PARG_END()); RTN_ReplaceSignature(rtn, (AFUNPTR) jwSocket, IARG_PROTOTYPE, proto,IARG_CONTEXT, IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE ,0,IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_FUNCARG_ENTRYPOINT_VALUE, 2, IARG_END); PROTO_Free(proto); } } ... }To replace signature of routine, a prototype object (PROTO) is allocated and passed toRTN_ReplaceSignature. PROTO_Allocate takes rerurn type, calling convention of the targetroutine, name of the routine and list of parameters. Parameters are in the pair of Type&Size.PIN_PARG macro is provided to create Type&Size pair of arguments. End of list is marked byPIN_PAG_END().In JIT mode signature is replaces using RTN_ReplaceSignature allows us to add new orremove old parameters of the routine. This is not allowed in probe mode, new signaturemust match original signature. RTN_ReplaceSignature takes replaced RTN object (rtn),pointer to new routine ((AFUNPTR)jwSocket), prototype of replaced routine(IARG_PROTOTYPE, proto)) and list of parameters for the new routine ending withIARG_END and returns pointer to original routine. Other parameters are explained below:  IARG_CONTEXT: pointer to the execution context (CONTEXT*).  IARG_ORIG_FUNCPTR: pointer to the original routine (AFUNPTR).  IARG_FUNCARG_ENTRYPOINT_VALUE, 0: Value of the first parameter passed to the routine. Needs to type casted properly before use.  ... is the place holder for (IARG_FUNCARG_ENTRYPOINT_VALUE, n) where n is the zero-based index of original parameter. Order of original parameters may change or parameters can be skipped if not required for analysis function.
  9. 9. It is very common to call the original routine from analysis routine. This can be done usingPIN_CallApplicationFunction as described below in “jwSocket” analysis function whichreplaced original “socket” function earlier. int jwConnect(CONTEXT *ctxt, AFUNPTR fpOrigin, WINDOWS::SOCKET socket, WINDOWS::PSOCKADDR pSocketName, int iNameLen) { ... PIN_CallApplicationFunction(ctxt, PIN_ThreadId(), CALLINGSTD_STDCALL, fpOrigin, PIN_PARG(int*), &iResult, PIN_PARG(WINDOWS::SOCKET), socket, PIN_PARG(WINDOWS::PSOCKADDR), pSocketName, PIN_PARG(int), iNameLen, PIN_PARG_END()); ... }The parameters are explained below:  ctxt: pointer to the context of the execution.  PIN_ThreadId() returns zero-based id of the executing thread assigned by Pin and is used here as Id of thread that will execute the function.  CALLINGSTD_STDCALL: calling convention of the function  fpOrigin: address of the function to execute.  PIN_PARG(int*), &iResult: address of the int variable in Type,Size,Value format where return value will be stored.  PIN_PARG(TypeOf(N)),N, ..., PIN_PARG_END(): List of input parameters passed in form of Type,Size,Value to the routine. End of list is marked with PIN_PARG_END.Another approach of doing this is inserting analysis calls on the boundaries of routine. Thisapproach is described in following code listing where “SetFilePointer” method from“kernel32.dll” is instrumented. else if("SetFilePointer" == sUndecFuncName) { rtn = RTN_FindByAddress(IMG_LowAddress(img)+SYM_Value(sym)); if(RTN_Valid(rtn)) { RTN_Open(rtn); RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR) b4SetFilePointer, IARG_ADDRINT, FALSE, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_FUNCARG_ENTRYPOINT_VALUE, 3, IARG_END); RTN_InsertCall(rtn, IPOINT_AFTER, (AFUNPTR) OnFileReturn, IARG_ADDRINT, SETFILE_PTR, IARG_ADDRINT, , IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(rtn); } }
  10. 10. Challenges and LimitationsFirst challenge we have encountered with Pin was control on I/O. In instrumentation,console I/O is usually gets locked by application once PIN_StartProgram is called hence isnot available to PinTool. In the case of GUI application, we couldn’t see a single line ofoutput on console by PinTool. The only reliable way of handling this was File I/O, which isrecommended in Pin documentation. Another problem with I/O was that we need to Openall files preferably in “main” function and is not allowed in Analysis routines. So it is notpossible to create a per thread log file unless the number of threads application will create isknown before instrumentation begins.Pin does not recommend using Platform API directly in PinTools.Using RTN_InsertCal(..., IPOINT_AFTER,...,IARG_FUNCARG_ENTRYPOINT_VALUE,...) toretrieve value of parameters passed by reference after function returns, mostly resulted inincorrect values. Using RTN_ReplaceSignature is the reliable way in this scenario.With the Windows APIs that result Handle e.g “CreateFile” instead of a primitive type likeint, float etc. analysis routines received “0” or “Null” handles whenRTN_InsertCall(...,IPOINT_AFTER,..., IARG_FUNCRET_EXITPOINT_VALUE,...) whileRTN_ReplaceSignature returned correct value.Using RTN_InsertCall(...,IPOINT_AFTER,...) sometimes result in more calls of analysis functionthan expected because Pin finds and instrument all the “RET” instruction in routine.Indentifying right Windows API to for instrumentation is another big challenge. Windowsmostly provides two versions of same function; a Unicode version (suffix ’W’) and an ASCIIversion (suffix ‘A’) while developers call function with no suffix, that is replaced based onProject’s build environment. In instrumentation PinTool must instrument the functionpresent in binary or instrument both of them. Some Unicode version of function internallycalls ASCII version or vice-versa; in this case we might see more calls than expected.Pin loses control when program is running in kernel mode hence might not be good enoughto analyse rootkits written mostly to work in kernel mode.
  11. 11. ConclusionsAlthough Dynamic Binary Instrumentation tools like Pin are developed primarily foranalysing behaviour of program in different context like code coverage, deadlock detectionetc., they can very much be used for identifying security related issues also, like file andnetwork activities, system modification or usage of vulnerable APIs in development.Researchers can use these tools to implement techniques like Taint-analysis, to identifyvulnerabilities and develop exploits. This becomes more useful when using Debugger is notfeasible due to anti-debugging techniques in malwares because Pin does not use platform’sdebug API for instrumentation.References [1]. Dynamic Program Analysis of Microsoft Windows Application {Alex Skaletsky, Tevi Devor, Nadav Chachmon, Robert Cohn, Kim Hazelwood, Vladimir Vladimirov, Moshe Bach} [2]. Pin: Intel’s Dynamic Binary Instrumentation Engine (CGO2010).{Robert Cohn, Tevi Devor} [3]. Analysing Parallel Programs with Pin. { Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi-Keung Luk, Gail Lyons, Harish Patil, and Ady Tal} [4]. Controlling Program Execution through Binary Instrumentation. {Heidi Pan, Krste Asanovi´c , Robert Cohn, Chi-Keung Luk} [5]. Dynamic Binary Instrumentation and Tools for Supporting Multi-Threaded Applications. {Mosche Bach} [6]. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. {Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, StevenWallace, Vijay Janapa Reddi, Kim Hazelwood} [7]. Hands-on Pin For Architecture, Operating system and Program Analysis.{Kim Hazelwood, Vijay Janapa Reddi} [8]. Practical Malware Analysis (BlackHat DC 2007).{Kris Kendall} [9]. Pin: Pin 2.8 User Guide. { http://www.pintool.org/docs/36111/Pin/html/}

×