Protecting Source Code


Published on

Godfrey Nolan's class on Protecting Android Source code at AnDevCon 2012

Published in: Technology
1 Comment
  • Nice PPT. But I think online protect shell is more convenient, such as
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • - hi name is godfreynolan, president of RIIS LLC, mobile development company in Southfield, MI- welcome to the conference, really looking forward to next couple days of conference- author of decompiling java and now decompiling android, due out tomorrow- free copy of the book if you visit our booth and drop off your business card
  • 2. Why are we herewe're here to talk about how to protect your android source codelook at the tools and techniques that I've encountered while writing the book and also as part of our security practice- hear no evil, see no evilsome of the excuses I regularly hear on why decompilation is something that can be ignored arecreate a good Android Application and continue to improve on it then source will protect itselfgood support and regular upgrades are much better ways of protecting your code than any tools or techniquesunderstanding your own code after 6 months is hard, how can anyone else understand reverse engineered codehowever the issue is not the quality of your code, apks are client side apps which often communicate with backend systems, so you need to protect any usernames and passwords or API keys exposed when the code is obfuscated.- decompile apknow let's see how someone would decompile an apkgoing to use an simple app called agile and beyond which we did for an agile conference in 2011ACTION - open app using ddms shortcut show couple pages and refreshACTION - pull the file, show the code and compare to original, explain how it's not perfect Dataservice.adb shell, su, ls /data/app ls /data/app-privateadb pull /data/app/ .dex2jar\\dex2jar raising the baras you can see it's pretty easy to decompile the coide, hardest thing to do is to get the usb driverswe have proguard, yes we do but nobody is using itOut of 100 apks we downloaded, 1 was obfuscated correctly and that was a phonegap project
  • 3. Why is it so easyEASY ACCESS TO APKsThere are a number of reasons why it's so easy, why we have this perfect storm. Most importantly there is very easy access to the apks. When I wrote decompiling java in 2004 it never really took off. Most java code is written for server side applications, sure there are some notable desktop application exceptions but by and large most code is server side. So if someone has access to your jar and class files then you have a much bigger problem, as they've already got over your firewall and hacked your server.When you download an apk to your phone or tablet, it is essentially a client side app so there are no firewalls to climb over or web servers to hack. Inside of the apk file is a classes.dex file which can be converted back into a java jar file and then decompiled.APK DESIGNMost applications that run on a virtual machine have problems with decompilation. Back in the day that included VB and any of its p-code variants. These days it includes Java (JVM), .Net (CLR) and now of course Android (DVM). VM's mean that a lot more information has to be included in the file. Typically there is also a separation of data and instructions which makes the bytecode easier to understand. Currently what really hurts classes.dex or Android is its relationship to Java and the fact that there's a program called dex2jar which can revert classes.dex back into a java jar file which can then be decompiled with your favorite decompiler, such as JD-GUI. That's not to say that you couldn't decompile classes.dex correctly as you could but there isn't anything out there yet that does a good job of that. Pitch book! iPhones don’t have the same issue as theipa is a binary file, it can be disassembled but not really decompiled. Talk about IDA Pro. RIM doesn't have the same issue as the spec of it's format is not published. Windows7 phones do have the same issue as they also use a VM, the CLR and there are several decompilers such as Reflector or ILSpyHTML5/CSS apps are also open to decompilation, for most HTML5 apps if you don't use a Javascript compressor you're going to have not only the Javascript source but also the comments.NOBODY USING OBFUSCATIONSo now the good news, Android ships with ProGuard which is a good first step in protecting your code, more on that later. It isn't perfect, but it's easy to use, raises the bar, which is exactly what we're trying to do. But the bad news is that very, very few people are using it. Would love to see an obfuscation coverage tool similar to a code coverage tool in the future.
  • 4. Why is it so easy (cont'd)Currently the main reason why decompilation is a problem for Android, is because of it's close relationship to Java. As you're probably aware the Java code that you write gets compiled into a classes.dex file. And if you're not, then this is a good time to take a look inside an apk file which is basically a zip file.ACTION - rename apk to zip and then unzip file.Regardless of what IDE or command line tool you use for your builds, your Java code gets compiled first into Java and then into the classes.dex file using the 'dx' command that comes with the Android SDK. The format of the classes.dex file is completely different from the Java class file. Don't know if you've been following the ongoing court case or not between Oracle and Google. But I presumed that the classes.dex format came out of Google trying to avoiding paying licensing fees to Sun or Oracle. However now i'm pretty sure that they were trying to create a minimalist format for small phones which would have to run mutiple virtual machines. If you want to convince yourself then take a look at the size of a classes.dex file and compare it a decompiled jar file. It's typically a lot smaller than the corresponding jar file.Xiaobo Pan from Hangzhou in China created a tool called dex2jar that reverses dx process and converts the classes.dex file back into a jar file and so it can be decompiled using any of the many java decompilers. It's not 100% perfect so in most cases it's quite hard to recompile the code, but it's good enough in most cases to provide a lot of valuable information to a hacker.
  • 5. Why is it so easy (cont'd)In this slide I'm showing the format of the java jar file on the left hand side and the classes.dex file on the right hand side. The main difference is that there is only one classes.dex file in an apk where there are multiple class files in the a jar file. ACTION - show the class file in xml and the classes.dex file in xmlClasses.dex file has Different structure, Different opcodes, Register based not stack based, Multiple DVMs on deviceThe bytecode lives in the data section of the classes.dex file and the code attribute section of the class file. This bytecode gets reverse engineered back into source code. And although dex2jar changes the Android bytecode back into Java bytecode there is no reason why the classes.dex can't be reverse engineered into Java source too.btw the principle way of protecting your files is using obfuscation. And obfuscators work by renaming variables in the constant pool or in the ids area of the dex file.
  • 6. Possible ExploitsOk, you've seen how easy it is to gain access to the source, so what can exactly does that mean. What are the possible exploits. ACTION - show API key, username and passwords and credit card informationThe names and locations have been changed to protect the innocentWhile I don’t think gaining access to one person’s credit card information is a huge issue, but you’ll probably find that it’s not PCI compliant and you’ll fail an audit There is also the possibility that someone could recompile your app or a modified version of your app and harvest usernames and passwords
  • 7. Downloading APKsLet's take a look again at downloading APKs from the phone or device onto your computer for decompilation. backup using Astro File Manager then use SDCard to get it off the phoneACTION - show Astro File ManagerRooting phone - Z4root, uses rage against the cage exploit. spins up as many adb shell's as your phone can handle and the last one standing is rooted. There are similar exploits for GingerBreak for Gingerbread andSuperboot for Ice Cream Sandwich.If your APK is out there and it's got any number of downloads you can probably be sure that it's been shared on any number of forums.ACTION – do a search for xda-developers forum fandango apk and click on [Q] fandango apkSo if there are any APK issues in old apks then be aware that simply fixing them and doing a marketplace update isn’t going to be enough if the old apk still has the keys to the castle.
  • Explain the difference between disassemblers and decompilersAlready talked about the android debug bridge or adb, dex2jar and JD-GUIWe also saw that dx could be used to compile java classes into classes.dex files, also worth looking at its output log as it’s one of the more complete disassemblers and does a really good job of pulling apart the classes.dex fileACTION – Show dx command and output logdexdump is another disassembler more in the vein of javap also comes with the Android SDKACTION – show dexdump command and output Dedexer is an alternative to dx’s log file, written by Gabor Paller in Hungary. Personally I really like dedexer as it’s easy to parse so I used it a lot in the book as a good starting point so that I didn’t have to convert the hexadecimal bytes before parsing the classes.dex file.ACTION – show dedexer command and output smali and baksmali continue the icelandic motif that you’ll find all everywhere in Android, baksmali means disassembly and smali means assembly to go along with Dalvik (a fishing town in Iceland) as in the Dalvik Virtual MachineAXMLPrinter2 converts the compressed AndroidManifest.xml in an apk back into a readable formatLets use the apktool to show smali and AXMLPrinter2ACTION – run apktool d
  • So what can you do to protect yourself. The two options are obfuscation and if you want to go a step further you might want to consider using the Android NDK or Native Developer Kit. We’ve already said that iphones have less of an issue with decompilation as the code is compiled down into a binary, well the good news is that you can do the same using the Android NDK. You can write your code in C++ and use the NDK to compile it into a library that can be included in your APK. The bad news is that you need a different version for whatever chip you are targeting. Almost every phone and tablet runs ARM so unless your APK is going to be running on an Intel chip then you should be ok.
  • Need to know what types of obfuscation are out there so you can decide what makes sense for you and just how high you want to raise the bar.Christian Collberg wrote a paper called a Taxonomy of Obfuscations which is where I took the list from. We can break obfuscations into 3 main types, namely Layout, Control and Data. The more transformations you employ, the less likely it will be that anyone or any tool can understand the original source. 99% of the obfuscation that you’ll find in early Java obfuscators was layout obfuscation.ACTION – show layout.javaThe concept behind control obfuscations is to confuse anyone looking at decompiled source by breaking up the control flow of the source. Functional blocks that belong together are broken apart, and functional blocks that don’t belong together are intermingled to make the source much more difficult to understand. If you remember Goto Considered Harmful, well the holy grail of obfuscation is to do just that, interleave gotos in the bytecode so that the control flow becomes irreducible or almost impossible to decompile.ACTION – show and Interleave.javaData obfuscations reshape the data into less natural forms to create confusion when someone is looking at your code.Best example of a data obfuscation is from Proguard. In this demo we’re using Wordpress’ open source Android app so we can compare the original source to the obfuscated source. This is taken from the book and was a method chosen at random from the Wordpress source.From the mapping file we know that public static void escapeHtml(Writer writer, String string) and public static void unescapeHtml(Writer writer, String string) methods have been pushed to a separate file which uses Data obfuscation and is basically unintelligibeACTION – show and
  • Thankfully we don’t have to employ these obfuscations to the code ourselves. We have two very good obfuscators one open source (ProGuard) and the other commercial (DashO)ProGuard ships with the Android SDK. Proguard not only obfuscates it also shrinks your code, optimizing the bytecode by removing unused instructions,unused classes, fields, methods, and attributes. Like all obfuscators it renames classes, fields, and methods using short meaningless names and also performs other control and data obfuscations.To get proguard to run enable it in the file, the configuration file is already optimized for Android apps.proguard.config=proguard.cfgACTION – show file, show the configuration file, show the proguard GUIAlways double check your apk to make sure it was obfuscated!!!!DashO (basic):Improvement over ProGuard's naming by using strange characters and heavily reusing the same names at different scopes. Supports string encryption to render important string data unreadable to attackers. Does a lot more control flow obfuscation than ProGuard, reordering code operations to make them very difficult to understand and often breaking decompilers. Supports tamper detection, handling, and reporting to prevent users from changing the compiled code, even while debugging, and to alert you if it happens. And it can automatically inject Preemptive's Runtime Intelligence functionality for remote error reporting which can be very important as it can be an extra burden to trace defects in obfuscated code. Although it doesn’t actually protect your code, putting a digital fingerprint or watermarkin your code allows you to later prove that you wrote your code. Ideally, this fingerprint—usually a copyright notice—acts like a software watermark that you can retrieve at any time even if your original code went through a number of changes or manipulations before it made it into someone else’s Java application or applet. DashO allows you to put a watermark in your code for later retrieval. jmark and jdecode are two such apps that allow you to place and watermark in a dummy method and jdecode recovers the watermark. But always doublecheck as optimizing obfuscators may remove your watermark.ACTION – show DashOIf all else fails then you can compile your passwords using the android ndkACTION – show Slide 11 NDK.cAnd if you’re using any of the HTML5/CSS cross platform tools such as PhoneGap or Titanium then you might want to look at some JavaScript compressors such as YUICompressor or JSMin to help compress your Javascript and remove any comments, help hide any usernames etc.ACTION – show Slide 11 YUI Compressor.js
  • Once you have obfuscated the code, because of all the method renaming it can be difficult to debug an apk once it’s been made available in the marketplace. Thankfully each obfuscator has a mapping file that shows you what methods got renamed to whatACTION – show slide mapping.txtProGuard also has a retrace.jar file which can be used in conjunction with the mapping.txt file and your stack trace to help you debug what happened. java -jar retrace.jar mapping.txt stackfile.traceHow and ever this can become a nightmare if you have multiple updates which of course will have different obfuscations each time.So you need to come up with some solution such as storing the mapping.txt files in your subversion or github repositories so you always get back to them for each version of the APK.Unit testing is also an issue with obfuscation, as you can see some of the methods can change quite dramatically. Currently we’re doing unit testing before obfuscation and then integration or functional tests after obfuscation along with some automated UI tests. There are ways to tell the ProGuard.cfg file to ignore files for unit testing but it didn’t work very well for us and we ended up not obfuscating too many classes. Would love to hear anyone’s input on this after the class.As you can see from the Obfuscation Theory slide obfuscation is defactoring as opposed to refactoring so it may seem very counterintuitive to many. I’m not advocating that you start employing bad programming practices but what I am advocating is the use of a tool to automate the process.
  • Wordpress have an open source android app that is great for comparing obfuscators. It’s a large real world app where you have access to the source code. Let’s take a look at the unobfuscated and obfuscated jars for the wordpress appACTION – show unobfuscated.jar, show wordpress_proguard, show wordpress_dashoRun proguardACTION – show proguard.out and explain about shrinkingACTION – run proguard and run dashoACTION – show
  • I found out yesterday that the launch has been pushed back to June 20th. The book is currently available on the Apress’ alpha books which is an ebook format and you can pre-order it on Amazon. Also we’re giving away a free book to anyone who drops off their business card at our booth tomorrow. We’ll ship a signed copy to you once it comes out.The book has lots of the same code I’ve shown here and it also has DexToXML a classes.dexdisassambler and DexToSource which is the first Android decompiler. But before you get all excited it’s not very comprehensive. It’ll do the examples in the book and not much else. Both are written in ANTLR and it’s pretty easy to follow the code and extend it if you really want to join me in my obsession with parsing bytecode.
  • If you want to follow the latest developments with the book and the parsing tools that we’re working on then please go to and if you want to send me an email, then my address is If you’re interested in having us help you secure your code or learn more about our Android development projects then you can find out more about RIIS at or visit us at our booth tomorrow. Had some good news today, one of our clients has instituted some fixes that stop their iphone app from being cracked, which is a good segue into the future of decompilation, in the early days Mocha was the big decompiler, HoseMocha put a stop to that by adding an extra pop bytecode which Mocha couldn’t handle, we’re looking at ways to do that with dex2jar and whatever else comes along. It’s a mini arms-race between the hackers and the security folks.
  • Protecting Source Code

    1. 1. Godfrey Nolan
    2. 2.  Hear no evil, see no evil Decompiling APK demo Raising the bar
    3. 3.  Easy access to APKs APK design Nobody using obfuscation
    4. 4.  According to DuoSecurity  Over 50% of Android phones are rootable  See for more information Vulnerabilities  ASHMEM  Exploid  Gingerbreak  Levitator  Memoproid  etc.
    5. 5.  Logins  API keys Credit card information Fake apps
    6. 6.  sdcard Rooting phone Download from forums
    7. 7.  Obfuscation Android NDK SQLCipher for SQLite Google Closure for JavaScript in HTML5/CSS Don’t use keys - login each time Break tools  Dex2Jar and Baksmali Google Encryption in Jelly Bean (RIP) Hide key info elsewhere (see resources)
    8. 8.  Obfuscation Theory  Layout  Control  Data
    9. 9. Obfuscation Type Classification TransformationLayout Scramble identifiers.Control Computations Insert dead or irrelevant code. Extend a loop condition. Reducible to non-reducible. Add redundant operands. Remove programming idioms. Parallelize code. Aggregations Inline and outline methods. Interleave methods. Clone methods. Loop transformations. Ordering Reorder statements. Reorder loops. Reorder expressions.Data Storage and encoding Change encoding. Split variables. Convert static data to procedural data. Aggregation Merge scalar variables. Factor a class. Insert a bogus class. Refactor a class. Split an array. Merge arrays. Fold an array. Flatten an array. Ordering Reorder methods and instance variables. Reorder arrays.
    10. 10.  Obfuscators  ProGuard and DexGuard  DashO
    11. 11.  Application size Performance Remove logging, debugging, testing code Protection
    12. 12.  At the bytecode level  Dead code elimination  Constant propagation  Method Inlining  Class Merging  Remove logging code  Peephole optimizations  Devirtualization
    13. 13.  Nothing is unbreakable, you can raise the bar:  Reflection  String encryption  Class encryption  Tamper detection  Debug detection  Emulator detection
    14. 14.  Bug fixing Unit testing Obfuscation = defactoring
    15. 15.  WordPress  ProGuard & DexGuard  DashO  HoseDex2Jar NDK
    16. 16.  DexToXML DexToSource Giveaway  What does Dex stand for?
    17. 17.
    18. 18.  @decompiling