Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bytecode manipulation with Javassist for fun and profit


Published on

Java bytecode is the form of instructions that the JVM executes.
A Java programmer, normally, does not need to be aware of how Java bytecode works.

Understanding the bytecode, however, is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application's domain. Profilers, mocking tools, AOP, ORM frameworks, IoC Containers, boilerplate code generators, etc. require to understand Java bytecode thoroughly and come up with means of manipulating it at runtime.
Each and every of these advanced features of what is nowadays standard approaches when programming with Java require a sound understanding of the Java bytecode, not to mention completely new languages running on the JVM such as Scala or Clojure.

Bytecode manipulation is not easy though ... except with Javassist.
Of all the libraries and tools providing advanced bytecode manipulation features, Javassist is the easiest to use and the quickest to master. It takes a few minutes to every initiated Java developer to understand and be able to use Javassist efficiently. And mastering bytecode manipulation, opens a whole new world of approaches and possibilities.

Published in: Technology
  • Be the first to comment

Bytecode manipulation with Javassist for fun and profit

  1. 1. 1 © Jerome Kehrli @ Bytecode manipulation with Javassist for fun and profit
  2. 2. 2 » Java bytecode is the form of instructions that the JVM executes. » A Java programmer, normally, does not need to be aware of how Java bytecode works. » Understanding the bytecode is essential for tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application's domain. + Profilers, + Mocking tools, + AOP, + ORM frameworks, + IoC Containers, + Boilerplate code generators, + etc. » Bytecode manipulation is not easy though ... except with Javassist. + Simple, efficient and natural way + A few minutes to every initiated Java developer to understand and master » And mastering bytecode manipulation, opens a whole new world of approaches and possibilities. Java Bytecode Manipulation
  3. 3. 3 » Boilerplate Code Generation » Lightweight and simple IoC Container Objectives for today Present Javassist and bytecode manipulation Introduce it in the light of 2 use cases
  4. 4. 4 Bytecode Manipulation ? » Bytecode manipulation consists in modifying the classes - represented by bytecode - compiled by the Java compiler, at runtime. Java Bytecode ? » Java source files are compiled to Java class files by the Java Compiler. These Java classes take the form of bytecode. This bytecode is loaded by the JVM to execute the Java program. Bytecode Manipulation
  5. 5. 5 1. Introduction 2. Javassist 3. Java Instrumentation framework and applications 4. Use Case A: Boilerplate Code Generation and Project Lombok 5. Use Case B: Simple and lightweight IoC Container 6. Conclusion Agenda
  6. 6. 6 1. Introduction
  7. 7. 7 » TODO 3 techniques 1. Type Introspection 2. Runtime Reflection 3. Bytecode Manipulation
  8. 8. 8 » In this article we'll dig into the library Javassist which is a bytecode manipulation framework » But before, let's describe two different, unrelated but complementary techniques: Type Introspection and Runtime Reflection Runtime Reflection
  9. 9. 9 1. Type Introspection
  10. 10. 10 2. Runtime Reflection
  11. 11. 11 1. Because Javassist attempts to keep an API as close as possible to the Java Runtime Reflection API, as a way to appear as natural as possible to Java developers. 2. This is maybe more important, because behaviour injected in Java Classes using bytecode manipulation is not known by the compiler. Thus, it is sometimes only available through runtime reflection. Why important ?
  12. 12. 12 3. Bytecode Manipulation » Bytecode manipulation allows the developer to express instructions in a format that is directly understood by the Java Virtual Machine, without passing from source code to bytecode through compiler. » Bytecode is somewhat similar to assembly code directly interpretable by the CPU but + bytecode is interpreted by a Virtual Machine, the JVM, + much more understandable that assembly code.
  13. 13. 13 Bytecode Manipulation typical use cases: » ORM frameworks such as Hibernate use bytecode manipulation to inject, for instance, relationship management code (lazy loading, etc.) inside mapped entities. » FindBugs inspects bytecode for dynamic code analysis » Languages like Groovy, Scala, Clojure generate bytecode from different source code. » IoC frameworks such as Spring use it to seamlessly weave your application lifecycle together » Language extensions like AspectJ can augment the capabilities of Java by modifying the classes that the Java compiler generated » etc. Use Cases
  14. 14. 14 » Write your own compiler for any kind of new and crazy language » Generate on the fly sub-classes of already loaded classes and use them instead of original classes to get additional behaviour » Write an instrumentation agent that plugs right into the JVM and modifies behaviour of classes before they are loaded by the classloader » etc. Several ways Our focus for today
  15. 15. 15 Most common libraries (1)
  16. 16. 16 » ASM s a project of the OW2 Consortium. It provides a simple API for decomposing, modifying, and recomposing binary Java classes. ASM exposes the internal aggregate components of a given Java class through its visitor oriented API. ASM also provides, on top of this visitor API, a tree API that represents classes as object constructs. Both APIs can be used for modifying the binary bytecode, as well as generating new bytecode » BCEL provides a simple library that exposes the internal aggregate components of a given Java class through its API as object constructs (as opposed to the disassembly of the lower-level opcodes). These objects also expose operations for modifying the binary bytecode, as well as generating new bytecode (via injection of new code into the existing code, or through generation of new classes altogether). » CGLIB is a powerful, high performance and quality Code Generation Library, it is used to extend JAVA classes and implements interfaces at runtime. CGLIB is really oriented towards implementing new classes at runtime, as opposed to modifying existing bytecode such as other libraries. » Javassist is a Java library providing a means to manipulate the Java bytecode of an application. In this sense Javassist provides the support for structural reflection, i.e. the ability to change the implementation of a class at run time. Most common libraries (2)
  17. 17. 17 2. Javassist
  18. 18. 18 From the Javassist web site: » Javassist (Java Programming Assistant) makes Java bytecode manipulation simple. » It is a class library for editing bytecodes in Java; it enables Java programs + to define a new class at runtime and + to modify a class file at loading time (when the JVM loads it) » Unlike other similar bytecode editors, Javassist provides two levels of API: + source level and + bytecode level. » If the users use the source-level API, they can edit a class file without knowledge of the specifications of the Java bytecode. + The whole API is designed with only the vocabulary of the Java language. You can even specify inserted bytecode in the form of source text; Javassist compiles it on the fly. » On the other hand, the bytecode-level API allows the users to directly edit a class file as other editors. Javsssist
  19. 19. 19 + to define a new class at runtime and + to modify a class file at loading time » The Linkage problem ! + Once a class has already been loaded, changing it would result in a Linkage Error (unless the JVM is launched with the JPDA [Java Platform Debugger Architecture] enabled, which would make a class dynamically reloadable). + Interestingly, Javasssist is perfectly able to modify a class long after the application has started as long as that specific class has not been loaded. Runtime vs. Loading Time Loading Time Runtime
  20. 20. 20 » Javassist : a high level API around classes, methods, fields, etc. + making it as easy as possible to change the implementation of existing classes + even implement completely new classes API
  21. 21. 21 ClassPool » This program first obtains a ClassPool object, which controls bytecode modification with Javassist. + The ClassPool object is a container of CtClass objects representing class files. + It reads a class file on demand for constructing a CtClass object and records the constructed object for responding later accesses. CtClass » The CtClass object obtained from a ClassPool object can be modified. » In the example above, it is modified so that the superclass of test.Rectangle is changed into a class test.Point. + This change is reflected on the original class file when writeFile() in CtClass() is finally called. Introduction Example
  22. 22. 22 » writeFile() translates the CtClass object into a class file and writes it on a local disk. » Javassist also provides a method for directly obtaining the modified bytecode: (Bear in mind that this is especially useful when implementing a Java agent) » You can directly load the CtClass as well: » Finally, a modified class should be returned to the pool to make the enhanced version available to the ClassLoader: Saving a modified CtClass
  23. 23. 23 » To define a new class from scratch, makeClass() must be called on a ClassPool. » This program defines a class Circle including no members except those inherited by the parent class Point. » Member methods of Circle can afterwards be created with factory methods declared in CtNewMethod and appended to Circle with addMethod() in CtClass. » makeClass() cannot create a new interface; makeInterface() in ClassPool can do. + Member methods in an interface can be created with abstractMethod() in CtNewMethod. Note that an interface method is an abstract method. Defining a new class
  24. 24. 24 » Methods are represented by CtMethod objects. + CtMethod provides several methods for modifying the definition of the method » Constructors are represented by their own type in Javassist: CtConstructor. + Both CtMethod and CtConstructor extends the same base class and have a lot of their API in common. » CtMethod and CtConstructor can be used to completely implement / rewrite a constructor or a method from scratch. + They also provide methods insertBefore(), insertAfter(), and addCatch(). + These are used for inserting a code fragment into the body of an existing method. » When implementing or rewriting completely a method from scratch, using CtNewMethod.make() is in my opinion the most convenient approach. + It enables the developer to implement a method by providing Java Source Code syntax in a simple string. Implementing / Modifying a class
  25. 25. 25 Example
  26. 26. 26 » implement the getters and setters for the fields of the class TestData Javassist example (1) What comes here ? …
  27. 27. 27 » Injecting getters and setters at runtime: Javassist example (2)
  28. 28. 28 » Testing the injected code Javassist example (3)
  29. 29. 29 3. Java Instrumentation framework and applications
  30. 30. 30 » Java 5 was the first version seeing the proper implementation of JSR-163 (Java Platform Profiling Architecture) support including a bytecode instrumentation mechanism through the introduction of the Java Programming Language Instrumentation Services - JPLIS. » At first that JSR only mentioned native (C) interfaces but evolved fast towards a pretty convenient Java API. » The key point of the JSR-163 is JVMTI. + JVMTI - or Java Virtual Machine Tool Interface - allows a program to inspect the state and to control the execution of applications running in the Java Virtual Machine. + JVMTI is designed to provide an Application Programming Interface (API) for the development of tools that need access to the state of the JVM. + Examples for such tools are debuggers, profilers or runtime boilerplate code generator. JSR-163
  31. 31. 31 » The Java Instrumentation Framework was an interesting breakthrough since it allowed, with the help of an agent, to modify the content of a class bytecode inherent to the methods of a class in such a way as to modify its behavior at runtime. » The linkage problem + Javassist cannot modify a class after it has been loaded by a classloader ... as far as this classloader is concerned. + Whenever one tries to modify a class already loaded by the referenced classloader, that attempt to call pool.makeClass( ... ) will fail and complain that class is frozen (i.e. already created via toClass(). + Being able to do that would require to unload the class first from the reference Classloader. + And that is really pretty difficult (not impossible) … » The only (easy) way to overcome this problem is to change the class implementation using bytecode manipulation before the class is loaded by any Classloader. + And happily this is pretty easy using a Java Agent Java Instrumentation Framework
  32. 32. 32 » In its essence, a Java agent is a regular Java class which follows a set of strict conventions. The agent class must implement a public static void premain(String agentArgs, Instrumentation inst) method which becomes an agent entry point (similar to the main method for regular Java applications). » Once the Java Virtual Machine (JVM) has initialized, each such premain(…) method of every agent will be called in the order the agents were specified on JVM start. When this initialization step is done, the real Java application main method will be called. Java Agents
  33. 33. 33 » A Java agent premain method takes the Instrumentation entry point - class java.lang.instrument.Instrumentation - as argument. » The most important API of the java.lang.instrument.Instrumentation class is the method void addTransformer(ClassFileTransformer transformer); » The ClassFileTransformer interface defines one single method byte[] transform(byte[] …) that is responsible to apply transformations to a class being loaded. » The transform(...) method is called for each and every class being loaded by a classloader. Behaviour of Agents
  34. 34. 34 Example (1) – A simple Logging Agent
  35. 35. 35 » When running from the command line, the Java agent could be passed to JVM instance using -javaagent argument which has following semantic - javaagent:<path-to-jar>[=options]. » A java agent needs to be packaged in a jar file and that jar file needs to have a specific and proper MANIFEST.MF file indicating the class containing the premain method. » A proper manifest file for the agent above should be packaged within the jar archive containing the agent classes under META-INF/MANIFEST.MF and would be as follows: » Now let's imagine we invoke our agent on a simple program defined as follows: Example (2) - Packaging
  36. 36. 36 Example (3) - Output
  37. 37. 37 4. Use Case A : Boilerplate Code Generation and Project Lombok
  38. 38. 38 » Project Lombok is a Boilerplate code generator + Addresses one of the most frequent criticism against java: the volume of boilerplate code + Boilerplate code : code that is repeated in many parts of an application with only slight contextual changes and with little added value. » Project Lombok reduces the need of some of the worst offenders by replacing each of them with a simple annotation. » Importantly in our context, Lombok doesn't just generate Java sources or bytecode: it transforms the Abstract Syntax Tree (AST), by modifying its structure at compile-time. + The AST is a tree representation of the parsed source code, created by the compiler, similar to the DOM tree model of an XML file. + By modifying (or transforming) the AST, Lombok keeps the source code trim and free of bloat, unlike plain-text code-generation. + Lombok's generated code is also visible to classes within the same compilation unit, unlike direct bytecode manipulation. Project Lombok
  39. 39. 39 » Let's see an example. Imagine the following Java POJO: » Typical boilerplate code required when considering such a POJO are: + Getters and Setters for all private fields, making them JavaBean properties + A nice toString method giving the values of its properties when an object is output on the console + Consistent hashCode and equals methods enabling to compare and manipulate two different objects with same values + A default constructor without any argument (Javabean standard) + An all args constructor taking all values as argument to build the instance Example Class
  40. 40. 40 » We want constructors : Without Lombok (1)
  41. 41. 41 » We want getter(s) / setter(s) : Without Lombok (2)
  42. 42. 42 » We want a toString() method Without Lombok (3)
  43. 43. 43 » We want consistent equals() and hashCode() methods Without Lombok (4)
  44. 44. 44 » No added value : an IDE can write this code for you ! » ratio of [Boilerplate code / Useful Code] of more than 1200% ! Without Lombok (5) 5 lines of code 4 fields Initial Class 60 lines of code 4 fields 2 constructors 10 methods Without Lombok
  45. 45. 45 » With Lombok, the class becomes: » All these annotations are straightforward to understand With Lombok (1)
  46. 46. 46 » Thanks to AST Transformation approach, really behaves as if all this boilerplate code was actually written ! » Much better ratio of [Boilerplate code / Useful Code] With Lombok (2) 5 lines of code 4 fields Initial Class 10 lines of code 4 fields 5 annotations With Lombok
  47. 47. 47 BCG : Boilerplate Code Generator
  48. 48. 48 » The BCG - BCG for Boilerplate Code Generator - tool mimics Lombok and re- implement two features of the Lombok feature set: + toString() method generation + property getters and setters generation » BCG is a simple tool that uses Javassist and implements a Java agent. » BCG is not a production tool or anything like it, it is really just a Javassist example and intended to demonstrate how straightforward, simple and efficient it would be to re-implement Lombok features using Javassist ... + ... should one want to do that, which is not likely since Lombok is working so cool and so easily extendable. » We will really only be mimicking project Lombok here using bytecode manipulation. + We are not implementing these features the same way Lombok is doing. + Lombok is working at compile-time using AST Transformation. + We will be working at runtime using bytecode manipulation. Use case A: generation of boilerplate code
  49. 49. 49 » We want to be able to implement transformers that take care of performing one specific modification to target classes and activated by the presence of one specific annotation on these classes. » Key idea : implement a Java Agent that analyze each and every class just before it is loaded by the classloader and verifies if this class needs to be transformed. » We want to implement Transformers that recognize classes declaring a specific annotation and proceed with the transformation of these classes. » We want the system to be easily extendable with new transformers. Principle
  50. 50. 50 Design
  51. 51. 51 Presenting the code er2/badtrash/entry/bytecode -manipulation-with-javassist- for1#sec43
  52. 52. 52 5. Use Case B : Simple and lightweight IoC Container
  53. 53. 53 » Inversion of Control is a design pattern related to lifecycle management of components in an application benefiting from a services architecture. » In such an application, business components are usually implemented in the form of various services : business services, business managers, DAOs, etc. + The main class delegates specific business concerns to business services, + which delegate finer aspects in their turn to managers, + which further delegate various business of technical aspects to smaller managers, or DAOs, adapters. etc. » Managing the construction and instantiation of these services is called components lifecycle management. » Very often, business services are stateless components. » Traditionally, for a very long time these stateless services have been implemented as singletons. + this was a very convenient approach since the main singleton simply needs to get the other singletons it was using, + which in turn simply needed to get the other singletons they were using, and so on. Application Lifecycle Management and Singletons
  54. 54. 54 The problem with Singletons Difficult to unit test Hide Dependencies Promote Tight Coupling Violate SRP
  55. 55. 55 » Inversion of control is initially mostly an answer to this problem, + increase the modularity of the application + make it more extensible, + more importantly testable in an easier way by removing the strict dependencies between components. » Key idea: delegate lifecycle management and injection of dependencies to a container + the container takes care of instantiating the components, managing their lifecycle in the required scope, and injecting their dependencies at runtime. » Injecting the dependencies at runtime, with a configurable approach, using a configuration file, annotations or even a dedicated API, opens the possibility to inject a different implementation of a service depending on the context, as long as it respects the required interface. + Mock objects + Different Iand prod implementations + Etc. Inversion of Control
  56. 56. 56 IoC Container » With IoC, a container, called lightweight container - as opposed to Java EE craps that are very heavy (and very bad) containers - takes care of instantiating and managing the lifecycle of the components as well as, more importantly, injecting the dependencies in every component. » in a usual application, the lifecycle of components starts with a main component (or class) that either creates the other services it requires or get their singletons. » These other components, in their turn, create or get references on their own dependencies, and so on.
  57. 57. 57 » The Spring Framework is an application framework and inversion of control container for the Java platform. The core of spring is really about IoC and components management but nowadays there is a complete ecosystem of tools and side frameworks around spring core aimed at developing web application, ORM concerns, etc. » The Pico Container is a very lightweight IoC Container and only that. Unlike spring, it is designed to remain small and simple and targets only IoC concerns, nothing else. It is not heavily maintained. » Apache Tapestry is an open-source component-oriented Java web application framework conceptually similar to JavaServer Faces and Apache Wicket. It provides IoC concerns in addition to the web application framework. » Google Guice is an open source software framework for the Java platform released by Google. It provides support for dependency injection using annotations to configure Java objects. Various Frameworks
  58. 58. 58 SCIF: Simple and Cute IoC Framework
  59. 59. 59 » Implementing Dependency Injection is actually a state-of-the-art use case for Javassist and a nice way to present the possibilities and whereabouts of bytecode manipulation. » We'll see now how to use Javassist in the light of a concrete use case: the implementation in a little more than 300 lines of code of a lightweight, simple but cute IoC Container: SCIF - Simple and Cute IoC Framework. Use Case B : Implementing an IoC Container
  60. 60. 60 » SCIF - the system we want to build - is an MVP » We want it to implement Dependency Injection in its simplest form: + Services are managed by the framework and stored in a Service Registry + Services should declare the annotation @Service to be discovered by the framework. The framework searches for services declaring this annotation in the classpath. + Dependencies are identified in services using the annotation @Resource. The framework analyze services to discover about their dependencies at runtime. ○ If @Resource is declared on a field, the framework injects the dependency directly, at build time. ○ If @Resource is declared on a getter, the framework uses bytecode manipulation to override the getter in a subclass and implement lazy loading of the dependency. + In case of getter (property) injection instead of field injection, SCIF is forced to generate a sub-class of the initial class and override the getter in that sub-class to implement lazy- loading. Principle
  61. 61. 61 What do we want ?
  62. 62. 62 Design
  63. 63. 63 Presenting the code er2/badtrash/entry/bytecode -manipulation-with-javassist- for#sec43 1 2 3 4
  64. 64. 64 5. Conclusion
  65. 65. 65 » Bytecode manipulation is a lot of fun and opens a whole new world of possibilities on the JVM. » It's the only way to implement advanced tooling such as IoC Containers, ORM frameworks, boilerplate code generators, etc. Normally, bytecode manipulation is something rather pretty difficult to achieve ... except with Javassist. » Javassist makes bytecode manipulation so easy and straightforward. The ability to write dynamically in simple strings actual java source code and add it on the fly as bytecode to classes being manipulated is striking. » Javassist is in my opinion the simplest way to perform bytecode manipulation in Java. » My own use cases: + ENTER / LEAVE + jt-property …
  66. 66. 66 Thanks for listening