How to really obfuscate
  your PDF malware


   Sebastian Porst - ReCon 2010
Email: sebastian.porst@zynamics.com
        Twitter: @LambdaCube
                                      1
Targeted Attacks 2008



                          Adobe Acrobat
  Microsoft Word;         Reader; 28.61%
      34.55%


                            Microsoft
                           PowerPoint;
       Microsoft Excel;      16.87%
           19.97%


     http://www.f-secure.com/weblog/archives/00001676.html 2
Targeted Attacks 2009



                    Adobe Acrobat
  Microsoft Word;
                    Reader; 48.87%
      39.22%




                           Microsoft
   Microsoft              PowerPoint;
  Excel; 7.39%              4.52%
                                        3
Exploited in the wild

CVE-            CVE-            CVE-            CVE-
2007-           2009-           2009-           2009-
5659            0658            1492            4324




        CVE-            CVE-            CVE-            CVE-
        2008-           2009-           2009-           2010-
        2992            0927            3459            0188
Four common exploit paths

Broken PDF Parser

Vulnerable JavaScript Engine

Vulnerable external libraries

/Launch
                                5
PDF Malware Obfuscation
 Different tricks for different purposes

Make manual analysis more difficult

Resist automated analysis

Avoid detection by virus scanners

                                           6
PDF Malware Obfuscation
         Conflicting goals


Avoid detection      Make analysis
   by being         difficult by being
  wellformed           malformed



                                         7
How to achieve these goals

  Being harmless         Being evil
• Avoid JavaScript       • Use heavy
• Do not use unusual       obfuscation
  encodings              • Try to break tools
• Do not try to break
  parser-based tools
• Ideally use an 0-day
                                                8
Let‘s be evil

                9
Breaking tools
Rule #1: Do the unexpected



                             11
This is what tools expect
• ASCII Strings
• Boring encodings like #41 instead of A
• Wellformed or only moderately malformed
  PDF file structure




                                            12
Malformed documents
• Adobe Reader tries to load malformed PDF
  files
• Very, very liberal interpretation of the PDF
  specification
• Parser-based analysis tools need to know
  about Adobe Reader file correction



                                                 13
Malformed PDF file – Example I
7 0 obj
  <<
   /Type /Action
   /S    /JavaScript
   /JS   (app.alert('whatever');)
  >>
endobj


                                    14
Malformed PDF file – Example II
5 0 obj
  << /Length 45 >>
  stream
    some data
  endstream
endobj




                                   15
Further reading




                  16
Obfuscating JavaScript code
Goal of JavaScript obfuscation




 Hide the shellcode

                                 18
JavaScript obfuscation in the wild
•   Screwed up formatting
•   Name obfuscation
•   Eval-chains
•   Splitting JavaScript code
•   Simple anti-emulation techniques
•   callee-trick
•   ...

                                         19
Screwed up formatting
• Basically just remove all newlines
• Completely useless: jsbeautifier.org




                                         20
Name obfuscation
• Variables or function names are renamed to
  hide their meaning
• Most JavaScript obfuscators screw this up




                                               21
Obfuscation example: Original code
function executePayload(payload, delay)
{
  if (delay > 1000)
  {
    // Whatever
  }
}

function heapSpray(code, repeat)
{
  for (i=0;i<repeat;i++)
  {
    code = code + code;
  }
}
                                          22
Obfuscation without considering scope
function executePayload(hkof3ewhoife, fhpfewhpofe)
{
  if (fhpfewhpofe > 1000)
  {
    // Whatever
  }
}

function heapSpray(hoprwehjoprew, hoifwep43)
{
  for (jnpfw93=0;jnpfw93<hoifwep43;jnpfw93++)
  {
    hoprwehjoprew = hoprwehjoprew + hoprwehjoprew;
  }
}
                                                     23
Obfuscation with considering scope
function executePayload(grtertttrr, hnpfefwefee)
{
  if (hnpfefwefee > 1000)
  {
    // Whatever
  }
}

function heapSpray(grtertttrr, hnpfefwefee)
{
  for (hjnprew=0;hjnprew<hnpfefwefee;hjnprew++)
  {
    grtertttrr = grtertttrr + grtertttrr;
  }
}
                                                   24
Obfuscation: Going the whole way
function ____(____, _____)
{
  if (_____ > 1000)
  {
    // Whatever
  }
}

function _____(____, _____)
{
  for (______=0; ______<_____; ______++)
  {
    ____ = ____ + ____;
  }
}
                                           25
Name obfuscation: Lessons learned
• Consider name scope
  – Deobfuscator needs to know scoping rules too
• Use underscores
  – Drives human analysts crazy
• Also cute: Use meaningful names that have
  nothing to do with the variable
  – Maybe shuffle real variable names


                                                   26
Eval chains
• JavaScript code can execute JavaScript code in
  strings through eval
• Often used to hide later code stages which are
  decrypted on the fly
• Common way to extract argument: replace
  eval with a printing function



                                               27
Eval chains: Doing it better
• Make sure your later stages reference
  variables or functions from earlier stages
• Re-use individual eval statements multiple
  times to make sure eval calls can not just be
  replaced




                                                  28
JavaScript splitting
• JavaScript can be split over several PDF
  objects
• These scripts can be executed consecutively
• Context is preserved between scripts
• In the wild I‘ve seen splitting across 2-4
  objects



                                                29
JavaScript splitting: Doing it better
• One line of JavaScript per object
• Randomize the order of JavaScript objects
• Admittedly it takes only one script to sort and
  extract the scripts from the objects




                                                    30
Anti-emulation code
• Simple checks for Adobe Reader extensions
• Multistaged JavaScript code




                                              31
Current malware loads code from

Pages

Annotations

Info Dictionary
                                  32
Example: Loading code from
            annotations
y = app.doc;

y.syncAnnotScan();

var p = y["getAnnots"]({nPage: 0});

var s = p[0].subject;

eval(s);
                                  33
Problems with current approaches



    Code is         Easy to
   in the file      extract


                               34
Anti-emulation code: Improved
 Key ideas behind anti-emulation code

Find idiosyncrasies in the
Adobe JavaScript engine

Find extensions that are
difficult to emulate
                                        35
Exhibit A: Idiosyncrasy
cypher = [7, 17, 28, 93, 4, 10, 4, 30, 7, 77, 83, 72];
cypherLength = cypher.length;

hidden = "ThisIsNotTheKeyYouAreLookingFor";
hiddenLength = hidden.toString().length;

for(i=0,j=0;i<cypherLength;i++,j++)
{
  cypherChar = cypher[i];
  keyChar = hidden.toString().charCodeAt(j);
  cypher[i] = String.fromCharCode(cypherChar ^ keyChar);

    if (j == hiddenLength - 1)
      j = -1;
}

eval(cypher.join(""));
                                                           36
Exhibit A: Explained

  JavaScript Standard         Adobe Reader JavaScript

  hidden = false;              hidden = false;
  hidden = "Key";              hidden = "Key";




hidden has the value „Key“   hidden has the value „true“
                                                     37
Exhibit A: Explained
The Adobe Reader JavaScript engine
defines global variables that do not
change their type on assignment.


(I suspect this happens because they are backed by C++ code)




                                                               38
Exhibit B: Difficult to emulate
• Goal: Find Adobe JavaScript API functions
  which are nearly impossible to emulate
• Then use effects of these functions in sneaky
  ways to change malware behavior
• The Adobe Reader JavaScript documentation
  is your friend



                                                  39
Exhibit B: Difficult to emulate
       Functions to look for

Rendering engine

Forms extensions

Multimedia extensions
                                   40
Exhibit B: Difficult to emulate
crypt = "T^_]^[T IEYYD__ FuRRKBD ";
plain = Array();
key = getPageNthWordQuads(0, 0).toString().split(",")[1];

for (i=0,j=0;i<crypt.length;i++,j++)
{
   plain = plain + String.fromCharCode((crypt.charCodeAt(i) ^
key.charCodeAt(j)));

    if (j >= key.length)
        j = 0;
}

app.alert(plain);
)


                                                            41
Exhibit B: Difficult to emulate
       Functions to avoid


Anything with
security restrictions

                                   42
Exhibit C: Multi-threaded JavaScript
• Multi-threaded applications are difficult to
  reverse engineer
• Problem: There are no threads in JavaScript
• Solution: setTimeOut
• Example: Cooperative multi-threading with
  message-passing between objects



                                                 43
Basic idea
• Multiple server objects
• String messages are passed between servers
• Messages contain new timeout value and
  code to evaluate




                                               44
function Server(name)
{
  ...
}

s1 = new Server("S1");
s2 = new Server("S2");

s1.receive(ENCODED_MESSAGE);




                               45
function Server(name)
{
  this.name = name;

  this.receive = function(message)
  {
    recipient = parse_recipient(message)
    delayTime = parse_delay(message)
    eval_string = parse_eval_string(message)
    msg_string = parse_message_string(message)

     eval(eval_string);
     command = "recipient.receive('" + msg_string + "')";
     this.x = app.setTimeOut(command, delayTime);
}
};




                                                            46
How to improve this
• Use a global string object as the message
  queue and manipulate the object on the fly
• Usage of non-commutative operations so that
  execution order really matters
• Message broadcasting
• Add anti-emulation code to eval-ed code



                                            47
callee-trick
• Not specific to Adobe Reader
• Frequently used by JavaScript code in other
  contexts
• Function accesses its own source and uses it
  as a key to decrypt code or data
• Add a single whitespace and decryption fails



                                                 48
callee-trick Example
function decrypt(cypher)
{
  var key = arguments.callee.toString();

    for (var i = 0; i < cypher.length; i++)
    {
      plain = key.charCodeAt(i) ^ cypher.charCodeAt(i);
    }

    ...
}




                                                          49
More ideas for the future
• Combine anti-debugging, callee-trick, and
  message passing
• Find more JavaScript engine idiosyncracies:
  Sputnik JavaScript test suite




                                                50
Thanks
•   Didier Stevens
•   Julia Wolf
•   Peter Silberman
•   Bruce Dang




                               51
52

How to really obfuscate your pdf malware

  • 1.
    How to reallyobfuscate your PDF malware Sebastian Porst - ReCon 2010 Email: sebastian.porst@zynamics.com Twitter: @LambdaCube 1
  • 2.
    Targeted Attacks 2008 Adobe Acrobat Microsoft Word; Reader; 28.61% 34.55% Microsoft PowerPoint; Microsoft Excel; 16.87% 19.97% http://www.f-secure.com/weblog/archives/00001676.html 2
  • 3.
    Targeted Attacks 2009 Adobe Acrobat Microsoft Word; Reader; 48.87% 39.22% Microsoft Microsoft PowerPoint; Excel; 7.39% 4.52% 3
  • 4.
    Exploited in thewild CVE- CVE- CVE- CVE- 2007- 2009- 2009- 2009- 5659 0658 1492 4324 CVE- CVE- CVE- CVE- 2008- 2009- 2009- 2010- 2992 0927 3459 0188
  • 5.
    Four common exploitpaths Broken PDF Parser Vulnerable JavaScript Engine Vulnerable external libraries /Launch 5
  • 6.
    PDF Malware Obfuscation Different tricks for different purposes Make manual analysis more difficult Resist automated analysis Avoid detection by virus scanners 6
  • 7.
    PDF Malware Obfuscation Conflicting goals Avoid detection Make analysis by being difficult by being wellformed malformed 7
  • 8.
    How to achievethese goals Being harmless Being evil • Avoid JavaScript • Use heavy • Do not use unusual obfuscation encodings • Try to break tools • Do not try to break parser-based tools • Ideally use an 0-day 8
  • 9.
  • 10.
  • 11.
    Rule #1: Dothe unexpected 11
  • 12.
    This is whattools expect • ASCII Strings • Boring encodings like #41 instead of A • Wellformed or only moderately malformed PDF file structure 12
  • 13.
    Malformed documents • AdobeReader tries to load malformed PDF files • Very, very liberal interpretation of the PDF specification • Parser-based analysis tools need to know about Adobe Reader file correction 13
  • 14.
    Malformed PDF file– Example I 7 0 obj << /Type /Action /S /JavaScript /JS (app.alert('whatever');) >> endobj 14
  • 15.
    Malformed PDF file– Example II 5 0 obj << /Length 45 >> stream some data endstream endobj 15
  • 16.
  • 17.
  • 18.
    Goal of JavaScriptobfuscation Hide the shellcode 18
  • 19.
    JavaScript obfuscation inthe wild • Screwed up formatting • Name obfuscation • Eval-chains • Splitting JavaScript code • Simple anti-emulation techniques • callee-trick • ... 19
  • 20.
    Screwed up formatting •Basically just remove all newlines • Completely useless: jsbeautifier.org 20
  • 21.
    Name obfuscation • Variablesor function names are renamed to hide their meaning • Most JavaScript obfuscators screw this up 21
  • 22.
    Obfuscation example: Originalcode function executePayload(payload, delay) { if (delay > 1000) { // Whatever } } function heapSpray(code, repeat) { for (i=0;i<repeat;i++) { code = code + code; } } 22
  • 23.
    Obfuscation without consideringscope function executePayload(hkof3ewhoife, fhpfewhpofe) { if (fhpfewhpofe > 1000) { // Whatever } } function heapSpray(hoprwehjoprew, hoifwep43) { for (jnpfw93=0;jnpfw93<hoifwep43;jnpfw93++) { hoprwehjoprew = hoprwehjoprew + hoprwehjoprew; } } 23
  • 24.
    Obfuscation with consideringscope function executePayload(grtertttrr, hnpfefwefee) { if (hnpfefwefee > 1000) { // Whatever } } function heapSpray(grtertttrr, hnpfefwefee) { for (hjnprew=0;hjnprew<hnpfefwefee;hjnprew++) { grtertttrr = grtertttrr + grtertttrr; } } 24
  • 25.
    Obfuscation: Going thewhole way function ____(____, _____) { if (_____ > 1000) { // Whatever } } function _____(____, _____) { for (______=0; ______<_____; ______++) { ____ = ____ + ____; } } 25
  • 26.
    Name obfuscation: Lessonslearned • Consider name scope – Deobfuscator needs to know scoping rules too • Use underscores – Drives human analysts crazy • Also cute: Use meaningful names that have nothing to do with the variable – Maybe shuffle real variable names 26
  • 27.
    Eval chains • JavaScriptcode can execute JavaScript code in strings through eval • Often used to hide later code stages which are decrypted on the fly • Common way to extract argument: replace eval with a printing function 27
  • 28.
    Eval chains: Doingit better • Make sure your later stages reference variables or functions from earlier stages • Re-use individual eval statements multiple times to make sure eval calls can not just be replaced 28
  • 29.
    JavaScript splitting • JavaScriptcan be split over several PDF objects • These scripts can be executed consecutively • Context is preserved between scripts • In the wild I‘ve seen splitting across 2-4 objects 29
  • 30.
    JavaScript splitting: Doingit better • One line of JavaScript per object • Randomize the order of JavaScript objects • Admittedly it takes only one script to sort and extract the scripts from the objects 30
  • 31.
    Anti-emulation code • Simplechecks for Adobe Reader extensions • Multistaged JavaScript code 31
  • 32.
    Current malware loadscode from Pages Annotations Info Dictionary 32
  • 33.
    Example: Loading codefrom annotations y = app.doc; y.syncAnnotScan(); var p = y["getAnnots"]({nPage: 0}); var s = p[0].subject; eval(s); 33
  • 34.
    Problems with currentapproaches Code is Easy to in the file extract 34
  • 35.
    Anti-emulation code: Improved Key ideas behind anti-emulation code Find idiosyncrasies in the Adobe JavaScript engine Find extensions that are difficult to emulate 35
  • 36.
    Exhibit A: Idiosyncrasy cypher= [7, 17, 28, 93, 4, 10, 4, 30, 7, 77, 83, 72]; cypherLength = cypher.length; hidden = "ThisIsNotTheKeyYouAreLookingFor"; hiddenLength = hidden.toString().length; for(i=0,j=0;i<cypherLength;i++,j++) { cypherChar = cypher[i]; keyChar = hidden.toString().charCodeAt(j); cypher[i] = String.fromCharCode(cypherChar ^ keyChar); if (j == hiddenLength - 1) j = -1; } eval(cypher.join("")); 36
  • 37.
    Exhibit A: Explained JavaScript Standard Adobe Reader JavaScript hidden = false; hidden = false; hidden = "Key"; hidden = "Key"; hidden has the value „Key“ hidden has the value „true“ 37
  • 38.
    Exhibit A: Explained TheAdobe Reader JavaScript engine defines global variables that do not change their type on assignment. (I suspect this happens because they are backed by C++ code) 38
  • 39.
    Exhibit B: Difficultto emulate • Goal: Find Adobe JavaScript API functions which are nearly impossible to emulate • Then use effects of these functions in sneaky ways to change malware behavior • The Adobe Reader JavaScript documentation is your friend 39
  • 40.
    Exhibit B: Difficultto emulate Functions to look for Rendering engine Forms extensions Multimedia extensions 40
  • 41.
    Exhibit B: Difficultto emulate crypt = "T^_]^[T IEYYD__ FuRRKBD "; plain = Array(); key = getPageNthWordQuads(0, 0).toString().split(",")[1]; for (i=0,j=0;i<crypt.length;i++,j++) { plain = plain + String.fromCharCode((crypt.charCodeAt(i) ^ key.charCodeAt(j))); if (j >= key.length) j = 0; } app.alert(plain); ) 41
  • 42.
    Exhibit B: Difficultto emulate Functions to avoid Anything with security restrictions 42
  • 43.
    Exhibit C: Multi-threadedJavaScript • Multi-threaded applications are difficult to reverse engineer • Problem: There are no threads in JavaScript • Solution: setTimeOut • Example: Cooperative multi-threading with message-passing between objects 43
  • 44.
    Basic idea • Multipleserver objects • String messages are passed between servers • Messages contain new timeout value and code to evaluate 44
  • 45.
    function Server(name) { ... } s1 = new Server("S1"); s2 = new Server("S2"); s1.receive(ENCODED_MESSAGE); 45
  • 46.
    function Server(name) { this.name = name; this.receive = function(message) { recipient = parse_recipient(message) delayTime = parse_delay(message) eval_string = parse_eval_string(message) msg_string = parse_message_string(message) eval(eval_string); command = "recipient.receive('" + msg_string + "')"; this.x = app.setTimeOut(command, delayTime); } }; 46
  • 47.
    How to improvethis • Use a global string object as the message queue and manipulate the object on the fly • Usage of non-commutative operations so that execution order really matters • Message broadcasting • Add anti-emulation code to eval-ed code 47
  • 48.
    callee-trick • Not specificto Adobe Reader • Frequently used by JavaScript code in other contexts • Function accesses its own source and uses it as a key to decrypt code or data • Add a single whitespace and decryption fails 48
  • 49.
    callee-trick Example function decrypt(cypher) { var key = arguments.callee.toString(); for (var i = 0; i < cypher.length; i++) { plain = key.charCodeAt(i) ^ cypher.charCodeAt(i); } ... } 49
  • 50.
    More ideas forthe future • Combine anti-debugging, callee-trick, and message passing • Find more JavaScript engine idiosyncracies: Sputnik JavaScript test suite 50
  • 51.
    Thanks • Didier Stevens • Julia Wolf • Peter Silberman • Bruce Dang 51
  • 52.