SlideShare a Scribd company logo
Automated static deobfuscation in
the context of Reverse Engineering


     Sebastian Porst (sebastian.porst@zynamics.com)
          Christian Ketterer (cketti@gmail.com)
Sebastian           Christian
• zynamics GmbH     • Student
• Lead Developer    • University of Karlsruhe
  – BinNavi         • Deobfuscation
  – REIL/MonoREIL
This talk



Obfuscated Code                                     Readable Code




                  (mysterious things happen here)

    20%                       40%                      40%
Motivation
• Combat common obfuscation
  techniques
• Can it be done?
• Will it produce useful results?
• Can it be integrated into our
  technology stack?
Examples of Obfuscation
Simple

         •   Jump chains
         •   Splitting calculations
         •   Garbage code insertion
         •   Predictable branches
         •   Self-modifying code
         •   Control-flow flattening
         •   Opaque predicates
         •   Code parallelization
         •   Virtual Machines
         •   ...

Tricky
Our Deobfuscation Approach
I. Copy ancient algorithms
     from compiler theory books
II. Translate obfuscated
     assembly code to REIL
III. Run algorithms on REIL code
IV. Profit (?)
We‘re late in the game ...

      199X   2000    2001    2002     2003       2004   2005     2006    2007     2008    2009




U of Auckland
                                                           U of Wisc +                   zynamics
                                    F. Perriot
                                                           TU Munich
                                    Mathur
                                                               U of Ghent
                                    M. Mohammed
                                                                Mathur

                                                                   Christodorescu
(see end of this presentation for proper source references)
                                                                        Bruschi
... but

199X   2000   2001   2002    2003   2004   2005   2006   2007   2008   2009




 Malware Research
 Defensive Reverse Engineering
 Offensive Reverse Engineering
REIL
• Reverse Engineering Intermediate Language
• Specifically designed for Reverse Engineering
• Design Goal: As simple as possible, but not
  simpler
• In use since 2007
Uses of REIL
Register Tracking: Helps Reverse Engineers follow data flow through code
(Never officially presented)



Index Underflow Detection: Automatically find negative array accesses
(CanSecWest 2009, Vancouver)



Automated Deobfuscation: Make obfuscated code more readable
(SOURCE Barcelona 2009, Barcelona)



ROP Gadget Generator: Automatically generates return-oriented shellcode
(Work in progress; scheduled for Q1/2010)
The REIL Instruction Set
Arithmetical    Bitwise   Data Transfer   Logical   Other

   ADD           AND         STR          BISZ      NOP
   SUB           OR          LDM          JCC       UNDEF
   MUL           XOR         STM                    UNKN
   DIV
   MOD
   BSH
Why REIL?
• Simplifies input code
• Makes effects obvious
• Makes algorithms platform-independent
MonoREIL
• Monotone Framework for REIL
• Based on Abstract Interpretation
• Used to write static code analysis algorithms




                                    http://www.flickr.com/photos/wedrrc/3586908193/
Why MonoREIL?
• In General: Makes complicated algorithms
  simple (trade brain effort for runtime)
• Deobfuscator: Wrong choice really, but we
  wanted more real-life test cases for MonoREIL
Building the Deobfuscator
• Java
• BinNavi Plugin
• REIL + MonoREIL




                    http://www.flickr.com/photos/mattimattila/3602654187/
Block Merging
• Long chains of basic blocks ending with
  unconditional jumps
• Confusing to follow in text-based
  disassemblers
• Advantage of higher abstraction level in
  BinNavi
  – Block merging is purely cosmetic
Block Merging
Before          After
Constant Propagation and Folding
• Two different concepts
• One algorithm in our implementation
• Partial evaluation of the input code
Constant Propagation and Folding
    Before             After
Dead Branch Elimination
• Removes branches that are never executed
  – Turns conditional jumps into unconditional jumps
  – Removes code from unreachable branch
• Requires constant propagation/folding
Dead Branch Elimination
Before             After
Dead Code Elimination
• Removes code that computes unused values
• Gets rid of inserted garbage code
• Cleans up after constant propagation/folding
Dead Code Elimination
Before            After
Dead Store Elimination
•   Comparable to dead code elimination
•   Removes useless memory write accesses
•   Limited to stack access in our implementation
•   Only platform-specific part of our optimizer
Dead Store Elimination
Before            After
Suddenly it dawned us:
Deobfuscation for RE brings new problems
which do not exist in other areas
Let‘s get some help
Problem: Side effects


push 10
                                  mov eax, 10
pop eax




                      Removed code was used
                      • in a CRC32 integrity check
                      • as key of a decryption routine
                      • as part of an anti-debug check
                      • ...
Problem: Code Blowup


mov eax, 10               mov eax, 20
add eax, 10               clc
                          ...




                          Good luck setting
                          • AF
                          • CF
                          • OF
                          • PF
                          • ZF
Problem: Moving addresses


0000: jmp ecx
                            0000: jmp ecx
0002: push 10
                            0002: mov eax, 10
0003: pop eax




ecx is 0003 but          we just missed the
static analysis          pop instruction
can not know this
Problem: Inability to debug



                             mov eax, 10




Executable Input File   Deobfuscated list of
                        Instructions but no
                        executable file
The only way to solve all* problems:


           A full-blown native
           code compiler with an
           integrated optimizer

    Too much work, maybe we can approximate ...


* except for the side-effects issue
Only generate optimized REIL code
    Before              After
Only generate optimized REIL code
• Produces excellent input for   • Side effects problem remains
other analysis algorithms        • Pretty much unreadable for
• Code blow-up solved            human reverse engineers
• Keeps address/instruction
mapping
• Code can not be debugged
natively but interpreted
Effect comments
Before            After
Effect comments
• Results can easily be used by   • Side effects problem remains
human reverse engineers           • Address mapping problem
• Code blow-up solved             • Code can not be debugged
                                  • Comments have semantic
                                  meaning
Extract formulas from code
 Before             After
Extract formulas from code
• Results can easily be used by   • Not really deobfuscation (but
human reverse engineers           produces similar result?)
• No code generation necessary,
only extraction of semantic
information
• Solves all problems because
original program remains
unchanged
Implement a small pseudo-compiler
    Before              After
Implement a small pseudo-compiler
• This is what we did              • Side effects problem remains
• Closest thing to the real deal   • Address mapping problem
• Code blow-up is solved           remains
    • Partially                    • Why not go for a complete
• Natively debug the output        compiler?
    • not in our case
    • pseudo x86 instructions
Economic value in creating a complete
    optimizing compiler for RE?

             Not for us

           • Small company
           • Limited market
         • Wrong approach?
Alternative Approaches
• Deobfuscator built into disassembler
• REIL-based formula extraction
• Hex-Rays Decompiler
• Code optimization and generation based on
  LLVM
• Emulation / Dynamic deobfuscation
Conclusion
• The concept of static deobfuscation is sound
  – Except for things like side-effects, SMC, ...
• A lot of work
• Expression reconstruction might be much
  easier and still produce comparable results
Related work
• A taxonomy of obfuscating transformations
• Defeating polymorphism through code
  optimization
• Code Normalization for Self-Mutating Malware
• Software transformations to improve malware
  detection
• Zeroing in on Metamorphic Computer Viruses
• ...
http://www.flickr.com/photos/marcobellucci/3534516458/

More Related Content

Similar to Automated static deobfuscation in the context of Reverse Engineering

Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)
Yan Cui
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
infodox
 
Effective development of a finite-element solver at the University
Effective development of a finite-element solver at the UniversityEffective development of a finite-element solver at the University
Effective development of a finite-element solver at the University
Romain Boman
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
Hannes Lowette
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Anne Nicolas
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
Hannes Lowette
 
Introduction to Aspect Oriented Programming by Donald Belcham
Introduction to Aspect Oriented Programming by Donald BelchamIntroduction to Aspect Oriented Programming by Donald Belcham
Introduction to Aspect Oriented Programming by Donald Belcham
.NET Conf UY
 
Introduction To AOP
Introduction To AOPIntroduction To AOP
Introduction To AOP
Donald Belcham
 
H 264 in cuda presentation
H 264 in cuda presentationH 264 in cuda presentation
H 264 in cuda presentation
ashoknaik120
 
Surge2012
Surge2012Surge2012
Surge2012
davidapacheco
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
Mark Stoodley
 
Applications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REILApplications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REIL
zynamics GmbH
 
10 Reasons You MUST Consider Pattern-Aware Programming
10 Reasons You MUST Consider Pattern-Aware Programming10 Reasons You MUST Consider Pattern-Aware Programming
10 Reasons You MUST Consider Pattern-Aware Programming
PostSharp Technologies
 
Intro To AOP
Intro To AOPIntro To AOP
Intro To AOP
Donald Belcham
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
frank2
 
Jfokus 2016 - A JVMs Journey into Polyglot Runtimes
Jfokus 2016 - A JVMs Journey into Polyglot RuntimesJfokus 2016 - A JVMs Journey into Polyglot Runtimes
Jfokus 2016 - A JVMs Journey into Polyglot Runtimes
Charlie Gracie
 
Pipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructorPipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructor
Moshe Zioni
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
Rajagopal Nagarajan
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
Tiago Henriques
 

Similar to Automated static deobfuscation in the context of Reverse Engineering (20)

Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
 
Effective development of a finite-element solver at the University
Effective development of a finite-element solver at the UniversityEffective development of a finite-element solver at the University
Effective development of a finite-element solver at the University
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
 
Introduction to Aspect Oriented Programming by Donald Belcham
Introduction to Aspect Oriented Programming by Donald BelchamIntroduction to Aspect Oriented Programming by Donald Belcham
Introduction to Aspect Oriented Programming by Donald Belcham
 
Introduction To AOP
Introduction To AOPIntroduction To AOP
Introduction To AOP
 
H 264 in cuda presentation
H 264 in cuda presentationH 264 in cuda presentation
H 264 in cuda presentation
 
Surge2012
Surge2012Surge2012
Surge2012
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
 
Applications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REILApplications of the Reverse Engineering Language REIL
Applications of the Reverse Engineering Language REIL
 
10 Reasons You MUST Consider Pattern-Aware Programming
10 Reasons You MUST Consider Pattern-Aware Programming10 Reasons You MUST Consider Pattern-Aware Programming
10 Reasons You MUST Consider Pattern-Aware Programming
 
Intro To AOP
Intro To AOPIntro To AOP
Intro To AOP
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...
 
Jfokus 2016 - A JVMs Journey into Polyglot Runtimes
Jfokus 2016 - A JVMs Journey into Polyglot RuntimesJfokus 2016 - A JVMs Journey into Polyglot Runtimes
Jfokus 2016 - A JVMs Journey into Polyglot Runtimes
 
Pipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructorPipiot - the double-architecture shellcode constructor
Pipiot - the double-architecture shellcode constructor
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 

More from zynamics GmbH

Automated Mobile Malware Classification
Automated Mobile Malware ClassificationAutomated Mobile Malware Classification
Automated Mobile Malware Classification
zynamics GmbH
 
VxClass for Incident Response
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Response
zynamics GmbH
 
Malware classification
Malware classificationMalware classification
Malware classificationzynamics GmbH
 
Hitb
HitbHitb
Eusecwest
EusecwestEusecwest
Eusecwest
zynamics GmbH
 
Bh dc09
Bh dc09Bh dc09
Bh dc09
zynamics GmbH
 

More from zynamics GmbH (6)

Automated Mobile Malware Classification
Automated Mobile Malware ClassificationAutomated Mobile Malware Classification
Automated Mobile Malware Classification
 
VxClass for Incident Response
VxClass for Incident ResponseVxClass for Incident Response
VxClass for Incident Response
 
Malware classification
Malware classificationMalware classification
Malware classification
 
Hitb
HitbHitb
Hitb
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Bh dc09
Bh dc09Bh dc09
Bh dc09
 

Automated static deobfuscation in the context of Reverse Engineering

  • 1. Automated static deobfuscation in the context of Reverse Engineering Sebastian Porst (sebastian.porst@zynamics.com) Christian Ketterer (cketti@gmail.com)
  • 2. Sebastian Christian • zynamics GmbH • Student • Lead Developer • University of Karlsruhe – BinNavi • Deobfuscation – REIL/MonoREIL
  • 3. This talk Obfuscated Code Readable Code (mysterious things happen here) 20% 40% 40%
  • 4. Motivation • Combat common obfuscation techniques • Can it be done? • Will it produce useful results? • Can it be integrated into our technology stack?
  • 5. Examples of Obfuscation Simple • Jump chains • Splitting calculations • Garbage code insertion • Predictable branches • Self-modifying code • Control-flow flattening • Opaque predicates • Code parallelization • Virtual Machines • ... Tricky
  • 6. Our Deobfuscation Approach I. Copy ancient algorithms from compiler theory books II. Translate obfuscated assembly code to REIL III. Run algorithms on REIL code IV. Profit (?)
  • 7. We‘re late in the game ... 199X 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 U of Auckland U of Wisc + zynamics F. Perriot TU Munich Mathur U of Ghent M. Mohammed Mathur Christodorescu (see end of this presentation for proper source references) Bruschi
  • 8. ... but 199X 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Malware Research Defensive Reverse Engineering Offensive Reverse Engineering
  • 9. REIL • Reverse Engineering Intermediate Language • Specifically designed for Reverse Engineering • Design Goal: As simple as possible, but not simpler • In use since 2007
  • 10. Uses of REIL Register Tracking: Helps Reverse Engineers follow data flow through code (Never officially presented) Index Underflow Detection: Automatically find negative array accesses (CanSecWest 2009, Vancouver) Automated Deobfuscation: Make obfuscated code more readable (SOURCE Barcelona 2009, Barcelona) ROP Gadget Generator: Automatically generates return-oriented shellcode (Work in progress; scheduled for Q1/2010)
  • 11. The REIL Instruction Set Arithmetical Bitwise Data Transfer Logical Other ADD AND STR BISZ NOP SUB OR LDM JCC UNDEF MUL XOR STM UNKN DIV MOD BSH
  • 12.
  • 13. Why REIL? • Simplifies input code • Makes effects obvious • Makes algorithms platform-independent
  • 14. MonoREIL • Monotone Framework for REIL • Based on Abstract Interpretation • Used to write static code analysis algorithms http://www.flickr.com/photos/wedrrc/3586908193/
  • 15. Why MonoREIL? • In General: Makes complicated algorithms simple (trade brain effort for runtime) • Deobfuscator: Wrong choice really, but we wanted more real-life test cases for MonoREIL
  • 16. Building the Deobfuscator • Java • BinNavi Plugin • REIL + MonoREIL http://www.flickr.com/photos/mattimattila/3602654187/
  • 17. Block Merging • Long chains of basic blocks ending with unconditional jumps • Confusing to follow in text-based disassemblers • Advantage of higher abstraction level in BinNavi – Block merging is purely cosmetic
  • 19. Constant Propagation and Folding • Two different concepts • One algorithm in our implementation • Partial evaluation of the input code
  • 20. Constant Propagation and Folding Before After
  • 21. Dead Branch Elimination • Removes branches that are never executed – Turns conditional jumps into unconditional jumps – Removes code from unreachable branch • Requires constant propagation/folding
  • 23. Dead Code Elimination • Removes code that computes unused values • Gets rid of inserted garbage code • Cleans up after constant propagation/folding
  • 25. Dead Store Elimination • Comparable to dead code elimination • Removes useless memory write accesses • Limited to stack access in our implementation • Only platform-specific part of our optimizer
  • 27. Suddenly it dawned us: Deobfuscation for RE brings new problems which do not exist in other areas
  • 29. Problem: Side effects push 10 mov eax, 10 pop eax Removed code was used • in a CRC32 integrity check • as key of a decryption routine • as part of an anti-debug check • ...
  • 30. Problem: Code Blowup mov eax, 10 mov eax, 20 add eax, 10 clc ... Good luck setting • AF • CF • OF • PF • ZF
  • 31. Problem: Moving addresses 0000: jmp ecx 0000: jmp ecx 0002: push 10 0002: mov eax, 10 0003: pop eax ecx is 0003 but we just missed the static analysis pop instruction can not know this
  • 32. Problem: Inability to debug mov eax, 10 Executable Input File Deobfuscated list of Instructions but no executable file
  • 33. The only way to solve all* problems: A full-blown native code compiler with an integrated optimizer Too much work, maybe we can approximate ... * except for the side-effects issue
  • 34. Only generate optimized REIL code Before After
  • 35. Only generate optimized REIL code • Produces excellent input for • Side effects problem remains other analysis algorithms • Pretty much unreadable for • Code blow-up solved human reverse engineers • Keeps address/instruction mapping • Code can not be debugged natively but interpreted
  • 37. Effect comments • Results can easily be used by • Side effects problem remains human reverse engineers • Address mapping problem • Code blow-up solved • Code can not be debugged • Comments have semantic meaning
  • 38. Extract formulas from code Before After
  • 39. Extract formulas from code • Results can easily be used by • Not really deobfuscation (but human reverse engineers produces similar result?) • No code generation necessary, only extraction of semantic information • Solves all problems because original program remains unchanged
  • 40. Implement a small pseudo-compiler Before After
  • 41. Implement a small pseudo-compiler • This is what we did • Side effects problem remains • Closest thing to the real deal • Address mapping problem • Code blow-up is solved remains • Partially • Why not go for a complete • Natively debug the output compiler? • not in our case • pseudo x86 instructions
  • 42. Economic value in creating a complete optimizing compiler for RE? Not for us • Small company • Limited market • Wrong approach?
  • 43. Alternative Approaches • Deobfuscator built into disassembler • REIL-based formula extraction • Hex-Rays Decompiler • Code optimization and generation based on LLVM • Emulation / Dynamic deobfuscation
  • 44. Conclusion • The concept of static deobfuscation is sound – Except for things like side-effects, SMC, ... • A lot of work • Expression reconstruction might be much easier and still produce comparable results
  • 45. Related work • A taxonomy of obfuscating transformations • Defeating polymorphism through code optimization • Code Normalization for Self-Mutating Malware • Software transformations to improve malware detection • Zeroing in on Metamorphic Computer Viruses • ...