This document discusses bytecode manipulation in Java. It begins with an overview of bytecode and how it works. It then covers how to manipulate bytecode using tools like Javaassist. Key reasons for manipulating bytecode are discussed, such as program analysis, class generation, security, and transforming classes without source code. The document proposes building a Java agent that tracks MongoDB operations and exposes metrics via JMX. It concludes by asking if anyone has any questions.
2. About myself
• CEO/CTO Plexteq OÜ
• Ph.D in information technology area
• Interests
• Software architecture
• High loaded systems
• Everything under the hood
• AI/ML + BigData
• Knowledge sharing ;)
• Follow me
• https://twitter.com/amoskvin
• https://www.facebook.com/moskvin.aleksey
3. Agenda
1. What is bytecode
2. How to manipulate byte code
3. Why would we need manipulate byte code
4. Let’s have some fun
1. Java agent
2. Expose java agent metrics via JMX
3. Use Java agent with regular Spring boot application
5. Bytecode
Java bytecode is the result of the compilation of a Java program, an
intermediate representation of that program which is machine
independent.
The Java bytecode gets processed by the Java virtual machine (JVM)
instead of the processor. It is the job of the JVM to make the necessary
resource calls to the processor in order to run the bytecode.
12. Bytecode
• Magic Number: 0xCAFEBABE
• Version of Class File Format: the minor and major versions of the class file
• Constant Pool: Pool of constants for the class
• Access Flags: for example whether the class is abstract, static, etc.
• This Class: The name of the current class
• Super Class: The name of the super class
• Interfaces: Any interfaces in the class
• Fields: Any fields in the class
• Methods: Any methods in the class
• Attributes: Any attributes of the class (for example the name of the
sourcefile, etc.)
16. Bytecode
JVM thread stack
Frame
Operand stack
Each thread of execution that stores frames
that represent method invocations
Each frame holds data, partial results, method return values and
performs dynamic linking
Each frame contains stack, known as Operand stack which holds
the operand values of JVM types
17. Bytecode
Java bytecodes instructions fall into these major categories:
• Load and store
• Method invocation and return
• Control transfer
• Arithmetical operation
• Type conversion
• Object manipulation
• Operand stack management
18. Bytecode
Call to super()
System.out.println(“Hello world”);
• Get System.out value out of a static field (i.e. PrintStream o = java.lang.System.in;)
• Put “Hello world” to operand stack
• Invoke java.io.PrintStream.println method on operant stack
21. Bytecode
Compiler time optimization
• “new” allocates StringBuilder object on heap
• “dup” duplicates the value on top of the stack
• “invokevirtual” invokes constructor
23. Bytecode manipulation
• Bytecode is just an array of bytes
• Bytecode of a class represents method declarations with instructions
set
• Java class loading mechanisms allow to load bytecode from
everywhere
• Byte code could be generated at the runtime
• Byte code of application could be changed at runtime using Agents
31. Bytecode manipulation
• Program analysis:
• find bugs in your application
• examine code complexity
• find classes with a specific annotation
• Class generation:
• lazy load data from a database using proxies
• Security:
• restrict access to certain APIs
• code obfuscation
• Transforming classes without the Java source code:
• code profiling / instrumentalization
• code optimization
33. Bytecode manipulation :: Java agent
• We can use a core Java feature introduced in 1.5 to manipulate the
bytecode. This feature is called a Java agent.
38. Fun
Let’s have some fun!
Good point to start with:
https://github.com/tomsquest/java-agent-asm-javassist-sample
39. Fun
Specifically:
1. Let’s code a Java Agent that gathers invocations to MongoDB Java
driver (insert, update, find) operations
2. Expose stats using JMX
3. Try agent with real Spring boot app
40. Fun :: JMX
Java Management Extensions (JMX) is a Java technology that supplies
tools for managing and monitoring applications, system objects,
devices (such as printers) and service-oriented networks.
Covered by JSR 160
Cool and simple implementation:
https://github.com/j256/simplejmx
The ClassPool is a container of CtClass objects implemented as a HashMap where the key is the name of the class and the value is the CtClass object representing the class
CtClass object to represent a class
The default ClassPool uses the same classpath as the underlying JVM.
Question to present
Funny case with lost source code and production patch release using bytecode manipulation with agents