SlideShare a Scribd company logo
1 of 31
Download to read offline
HIGH PERFORMANCE
INSTRUMENTATION
Jaroslav Bachorík
@yardus, @btraceio
Prague, 20-21 October 2016
ABOUT ME
Jaroslav Bachorík Prague, 20-21 October 2016
Jaroslav Bachorík, j.bachorik@btrace.io, jaroslav@unraveldata.com
@yardus
PERFORMANCE
Jaroslav Bachorík Prague, 20-21 October 2016
PERFORMANCE
● Quantifiable
○ startup time
○ request latency
○ CPU usage
○ Memory usage
● Reproducible
○ controlled environment
○ consistent results
● Measurable
○ strictly defined target goals
● Benchmarking
Jaroslav Bachorík Prague, 20-21 October 2016
INSTRUMENTATION
Jaroslav Bachorík Prague, 20-21 October 2016
INSTRUMENTATION
int method() {
MyObject o = new MyObject();
int x = o.getCount();
logger.debug(“Instance “ + o “ has count “ + x;
//
return x;
}
Jaroslav Bachorík Prague, 20-21 October 2016
INSTRUMENTATION
● APIs and code providing means to monitor and control application
○ loggers
○ stat counters
○ profilers
● Decoupled from the application
○ application works properly without instrumentation
○ same instrumentation may work for multiple applications
Jaroslav Bachorík Prague, 20-21 October 2016
SOURCE LEVEL INSTRUMENTATION
● Instrumentation part of the source base
○ OS
■ dtrace
■ systemtap
○ Runtime
■ JFR
■ jstat counters
○ Application
■ logging
● Difficult to modify and extend
○ requires access to sources
○ rebuild & redistribution
Jaroslav Bachorík Prague, 20-21 October 2016
BYTECODE LEVEL INSTRUMENTATION
● No source code modifications
● Modifying bytecode
○ result of Java source compilation
○ binary executable consumed by JVM
● Bytecode Injection (BCI)
○ during compilation
■ eg. maven AOP plugins
■ same drawbacks as static
instrumentation
○ during class loading
■ JVM agent and class transformers
Jaroslav Bachorík Prague, 20-21 October 2016
JVM JVM Agent
Classes
Classloader
Transformer
Transformer
Transformer
CLASS TRANSFORMERS
Jaroslav Bachorík Prague, 20-21 October 2016
java.lang.instrument.ClassTransformer
byte[] transform(ClassLoader l, String name, Class<?> cls,
ProtectionDomain pd, byte[] classfileBuffer)
● Inspect and modify the class data
○ complex task
■ constant pool
■ stack frame map
○ better delegate to specialized tools
■ ASM
■ ByteBuddy
■ CGLIB
DYNAMIC INSTRUMENTATION /w BCI
● Required steps
○ Create and register JVM agent
○ Create and register class transformers
○ Prepare injected bytecode
■ create bytecode
■ validate bytecode
○ Inject bytecode
■ merge class bytecode /w injected bytecode
■ validate merged bytecode
■ redefine/retransform class using merged bytecode
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE
● Bytecode level instrumentation simplified
○ JVM agent
○ Class Transformers
○ Optimized bytecode injection
○ Safety guarantees
● Injected code as POJO
○ annotations specify where injection should go
○ code specifies what should be injected
● Started as a research project at Sun JDK Serviceability
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE SCRIPT
● Easy access to
○ class and method name and parameters
○ enclosing instance
○ return value
○ method duration
○ fields via reflection
■ immutable, guarded, access only
● Interfacing via
○ stdout
○ file
○ JMX (MXBean)
○ jstat counters
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE SCRIPT
@BTrace public class AllMethods {
@OnMethod(clazz="/javax.swing..*/", method="/.*/")
public static void m(@Self Object o, @ProbeClassName String probeClass,
@ProbeMethodName String probeMethod) {
println("this = " + o);
print("entered " + probeClass);
println("." + probeMethod);
}
}
Jaroslav Bachorík Prague, 20-21 October 2016
> this = DerivedColor(color=192,192,193)
> entered javax.swing.plaf.nimbus.DerivedColor.getRGB
INSTRUMENTATION
PERFORMANCE
Jaroslav Bachorík Prague, 20-21 October 2016
PERFORMANCE IMPACT
● Class (re)transformation
○ application startup time
● Injected bytecode instructions
○ CPU usage
○ JIT optimizer decisions
○ heap usage
○ GC activity
● Instrumentation framework
○ additional drain of resources (CPU, RAM)
Jaroslav Bachorík Prague, 20-21 October 2016
SPARK SPECIFICS
● Distributed environment
● Worker JVMs come and go
○ startup time is important
● The inner parts are frequently executed
○ RDD (Resilient Distributed Dataset) iterators
○ latency/overhead of injected code is important
● Startup time equally important as latency/overhead
Jaroslav Bachorík Prague, 20-21 October 2016
CLASS (RE)TRANSFORMATION
● Affects application startup time
● Major impact on short lived applications
● Usually a small number of classes will be instrumented
○ optimize class filter for non-match
● Minimize overhead of parsing class files
○ register as few transformers as possible
○ consider smart caching of the transformed class data
● Example: Spark driver
○ lifespan easily just a few minutes
○ 104
+ classes loaded at startup
○ optimizing class transformation decreased overall overhead by >1.5%
Jaroslav Bachorík Prague, 20-21 October 2016
INJECTED BYTECODE
● Affects the application runtime performance
● Keep injected code as simple as possible
○ no non-deterministic loops
○ minimize external method calls
■ escape analysis
○ prefer working with stack instead of fields
■ method arguments
■ local variables
● Smart activation of injected code
○ sampling
○ injection guards
Jaroslav Bachorík Prague, 20-21 October 2016
ESCAPING OBJECTS
● A local instance escapes via injected instrumentation
○ affects GC and JIT optimizer decisions
Jaroslav Bachorík Prague, 20-21 October 2016
int method() {
MyObject o = new MyObject();
int x = o.getCount();
return x;
}
int method() {
MyObject o = new MyObject();
int x = o.getCount();
// inspect the instance providing the count
// a local instance 'o' escapes the method scope
Instrumentation.inspect(o);
//
return x;
}
GC INTERFERENCE
● Minimize instrumentation interference with GC
○ use off-heap data structures where possible
○ specialized primitive collections
○ specialized queues in runtime (eg. JCTools)
● Reduce instantiations to minimum
○ boxing
○ string concatenation
○ varargs
● Collect only raw data
○ aggregations on different JVM or host
Jaroslav Bachorík Prague, 20-21 October 2016
STACK UNWINDING
● Reuse the values stored on stack
Jaroslav Bachorík Prague, 20-21 October 2016
Java Stack
GETSTATIC TestClass.name : Ljava/lang/String;
LLOAD 2
INVOKESPECIAL C.m (Ljava/lang/String;J)J String: “name”
Long : 2 (H)
Long : 2 (L)
DUP_X2
String: “name”
Long : 2 (H)
Long : 2 (L)
DUP2_X1
INVOKESTATIC Probe.p(Ljava/lang/String;J)V
TIMESTAMP FOLDING
● Timestamps are expensive
○ TSC correlated across cores
○ monotonic counter values adjusted for core frequencies
● Minimize number of requested timestamps
○ fold in subsequent calls to System.nanoTime()
● BTrace will optimize timestamps for @Duration parameters
Jaroslav Bachorík Prague, 20-21 October 2016
INVOCATION SAMPLING
● Instrumented methods are frequently executed
○ injected code causing high overhead
● Short methods experiencing unproportional overhead
● Rely on statistically relevant sample instead
○ execute only on each Nth
pass
○ adjust N for acceptable overhead and detail
● Use @Sampled annotation in BTrace
○ fixed N
○ dynamically adjusted N for guaranteed overhead
Jaroslav Bachorík Prague, 20-21 October 2016
SAMPLING IN BTRACE
@BTrace
public class ArgsDurationSampled {
@OnMethod(clazz="/.*.OnMethodTest/", method="args", location=@Location(value=Kind.RETURN))
@Sampled(kind = Sampled.Sampler.Const, mean = 20)
public static void args(@Self Object self, @Return long retVal, @Duration long dur) {
println("args");
}
}
Jaroslav Bachorík Prague, 20-21 October 2016
// Adaptive sampler keeps ‘mean’ nanoseconds between samples in average
@Sampled(kind = Sampled.Sampler.Adaptive, mean = 300)
INJECTION GUARDS
● Fastest code is the one never executed
● Think of Logger levels
● Class retransformation is costly
● Introducing injection guards
○ injected code executed only when a condition is met
○ minimal overhead when not executing injected code
■ fast field check
● Use @Level annotation in BTrace
Jaroslav Bachorík Prague, 20-21 October 2016
@OnMethod(clazz="org.apache.spark.rdd.RDD",
method="iterator",
enableAt=@Level(">=" + SAMPLING_LEVEL),
location=@Location(Kind.RETURN))
LESSONS LEARNED
Jaroslav Bachorík Prague, 20-21 October 2016
HIGH PERFORMANCE INSTRUMENTATION
● Fast filters for identifying injection points
● Minimal and optimized code for injection
○ use timestamps sparsely
○ beware of callbacks from injected code
○ prefer stack manipulation above field retrievals
● Be gentle to GC
● Use sampling when possible
○ getting overhead down
○ still obtaining valid insights
● Enable turning off injection when not needed
○ class retransformation is slow
○ injection guards
Jaroslav Bachorík Prague, 20-21 October 2016
Resources
● BTrace (https://github.com/btraceio/btrace)
○ Contributors welcomed!
● ASM (http://asm.ow2.org/index.html)
● CGLIB (https://github.com/cglib/cglib)
● ByteBuddy (http://bytebuddy.net/#/)
● JCTools (https://github.com/JCTools/JCTools)
Jaroslav Bachorík Prague, 20-21 October 2016
Q&A
Jaroslav Bachorík Prague, 20-21 October 2016
THANK YOU!
Jaroslav Bachorík Prague, 20-21 October 2016
j.bachorik@btrace.io, @yardus

More Related Content

Viewers also liked

Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
janemangat
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
James Clause
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723
Iftach Ian Amit
 
Java bytecode and classes
Java bytecode and classesJava bytecode and classes
Java bytecode and classes
yoavwix
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
PTIHPA
 
16. количество вещества. молярная масса
16. количество вещества. молярная масса16. количество вещества. молярная масса
16. количество вещества. молярная масса
ulaeva
 

Viewers also liked (19)

Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
 
Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic Tracing
 
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentationnullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723
 
Java bytecode and classes
Java bytecode and classesJava bytecode and classes
Java bytecode and classes
 
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
 
Valgrind
ValgrindValgrind
Valgrind
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
Finlay and Elisa's East Lothian wedding album
Finlay and Elisa's East Lothian wedding albumFinlay and Elisa's East Lothian wedding album
Finlay and Elisa's East Lothian wedding album
 
Khawar CV1. - Copy
Khawar CV1. - CopyKhawar CV1. - Copy
Khawar CV1. - Copy
 
Baby boutique dubai
Baby boutique dubaiBaby boutique dubai
Baby boutique dubai
 
shelyngibb slide design
shelyngibb slide designshelyngibb slide design
shelyngibb slide design
 
Teórie pravdy : korešpondenčná, koherenčná a konsenzuálna
Teórie pravdy : korešpondenčná, koherenčná a konsenzuálnaTeórie pravdy : korešpondenčná, koherenčná a konsenzuálna
Teórie pravdy : korešpondenčná, koherenčná a konsenzuálna
 
Tajuddin 2.3
Tajuddin 2.3 Tajuddin 2.3
Tajuddin 2.3
 
16. количество вещества. молярная масса
16. количество вещества. молярная масса16. количество вещества. молярная масса
16. количество вещества. молярная масса
 
René Descartes
René DescartesRené Descartes
René Descartes
 
Baby shop dubai
Baby shop dubaiBaby shop dubai
Baby shop dubai
 

GeeCon2016- High Performance Instrumentation (handout)

  • 2. ABOUT ME Jaroslav Bachorík Prague, 20-21 October 2016 Jaroslav Bachorík, j.bachorik@btrace.io, jaroslav@unraveldata.com @yardus
  • 4. PERFORMANCE ● Quantifiable ○ startup time ○ request latency ○ CPU usage ○ Memory usage ● Reproducible ○ controlled environment ○ consistent results ● Measurable ○ strictly defined target goals ● Benchmarking Jaroslav Bachorík Prague, 20-21 October 2016
  • 6. INSTRUMENTATION int method() { MyObject o = new MyObject(); int x = o.getCount(); logger.debug(“Instance “ + o “ has count “ + x; // return x; } Jaroslav Bachorík Prague, 20-21 October 2016
  • 7. INSTRUMENTATION ● APIs and code providing means to monitor and control application ○ loggers ○ stat counters ○ profilers ● Decoupled from the application ○ application works properly without instrumentation ○ same instrumentation may work for multiple applications Jaroslav Bachorík Prague, 20-21 October 2016
  • 8. SOURCE LEVEL INSTRUMENTATION ● Instrumentation part of the source base ○ OS ■ dtrace ■ systemtap ○ Runtime ■ JFR ■ jstat counters ○ Application ■ logging ● Difficult to modify and extend ○ requires access to sources ○ rebuild & redistribution Jaroslav Bachorík Prague, 20-21 October 2016
  • 9. BYTECODE LEVEL INSTRUMENTATION ● No source code modifications ● Modifying bytecode ○ result of Java source compilation ○ binary executable consumed by JVM ● Bytecode Injection (BCI) ○ during compilation ■ eg. maven AOP plugins ■ same drawbacks as static instrumentation ○ during class loading ■ JVM agent and class transformers Jaroslav Bachorík Prague, 20-21 October 2016 JVM JVM Agent Classes Classloader Transformer Transformer Transformer
  • 10. CLASS TRANSFORMERS Jaroslav Bachorík Prague, 20-21 October 2016 java.lang.instrument.ClassTransformer byte[] transform(ClassLoader l, String name, Class<?> cls, ProtectionDomain pd, byte[] classfileBuffer) ● Inspect and modify the class data ○ complex task ■ constant pool ■ stack frame map ○ better delegate to specialized tools ■ ASM ■ ByteBuddy ■ CGLIB
  • 11. DYNAMIC INSTRUMENTATION /w BCI ● Required steps ○ Create and register JVM agent ○ Create and register class transformers ○ Prepare injected bytecode ■ create bytecode ■ validate bytecode ○ Inject bytecode ■ merge class bytecode /w injected bytecode ■ validate merged bytecode ■ redefine/retransform class using merged bytecode Jaroslav Bachorík Prague, 20-21 October 2016
  • 12. BTRACE ● Bytecode level instrumentation simplified ○ JVM agent ○ Class Transformers ○ Optimized bytecode injection ○ Safety guarantees ● Injected code as POJO ○ annotations specify where injection should go ○ code specifies what should be injected ● Started as a research project at Sun JDK Serviceability Jaroslav Bachorík Prague, 20-21 October 2016
  • 13. BTRACE SCRIPT ● Easy access to ○ class and method name and parameters ○ enclosing instance ○ return value ○ method duration ○ fields via reflection ■ immutable, guarded, access only ● Interfacing via ○ stdout ○ file ○ JMX (MXBean) ○ jstat counters Jaroslav Bachorík Prague, 20-21 October 2016
  • 14. BTRACE SCRIPT @BTrace public class AllMethods { @OnMethod(clazz="/javax.swing..*/", method="/.*/") public static void m(@Self Object o, @ProbeClassName String probeClass, @ProbeMethodName String probeMethod) { println("this = " + o); print("entered " + probeClass); println("." + probeMethod); } } Jaroslav Bachorík Prague, 20-21 October 2016 > this = DerivedColor(color=192,192,193) > entered javax.swing.plaf.nimbus.DerivedColor.getRGB
  • 16. PERFORMANCE IMPACT ● Class (re)transformation ○ application startup time ● Injected bytecode instructions ○ CPU usage ○ JIT optimizer decisions ○ heap usage ○ GC activity ● Instrumentation framework ○ additional drain of resources (CPU, RAM) Jaroslav Bachorík Prague, 20-21 October 2016
  • 17. SPARK SPECIFICS ● Distributed environment ● Worker JVMs come and go ○ startup time is important ● The inner parts are frequently executed ○ RDD (Resilient Distributed Dataset) iterators ○ latency/overhead of injected code is important ● Startup time equally important as latency/overhead Jaroslav Bachorík Prague, 20-21 October 2016
  • 18. CLASS (RE)TRANSFORMATION ● Affects application startup time ● Major impact on short lived applications ● Usually a small number of classes will be instrumented ○ optimize class filter for non-match ● Minimize overhead of parsing class files ○ register as few transformers as possible ○ consider smart caching of the transformed class data ● Example: Spark driver ○ lifespan easily just a few minutes ○ 104 + classes loaded at startup ○ optimizing class transformation decreased overall overhead by >1.5% Jaroslav Bachorík Prague, 20-21 October 2016
  • 19. INJECTED BYTECODE ● Affects the application runtime performance ● Keep injected code as simple as possible ○ no non-deterministic loops ○ minimize external method calls ■ escape analysis ○ prefer working with stack instead of fields ■ method arguments ■ local variables ● Smart activation of injected code ○ sampling ○ injection guards Jaroslav Bachorík Prague, 20-21 October 2016
  • 20. ESCAPING OBJECTS ● A local instance escapes via injected instrumentation ○ affects GC and JIT optimizer decisions Jaroslav Bachorík Prague, 20-21 October 2016 int method() { MyObject o = new MyObject(); int x = o.getCount(); return x; } int method() { MyObject o = new MyObject(); int x = o.getCount(); // inspect the instance providing the count // a local instance 'o' escapes the method scope Instrumentation.inspect(o); // return x; }
  • 21. GC INTERFERENCE ● Minimize instrumentation interference with GC ○ use off-heap data structures where possible ○ specialized primitive collections ○ specialized queues in runtime (eg. JCTools) ● Reduce instantiations to minimum ○ boxing ○ string concatenation ○ varargs ● Collect only raw data ○ aggregations on different JVM or host Jaroslav Bachorík Prague, 20-21 October 2016
  • 22. STACK UNWINDING ● Reuse the values stored on stack Jaroslav Bachorík Prague, 20-21 October 2016 Java Stack GETSTATIC TestClass.name : Ljava/lang/String; LLOAD 2 INVOKESPECIAL C.m (Ljava/lang/String;J)J String: “name” Long : 2 (H) Long : 2 (L) DUP_X2 String: “name” Long : 2 (H) Long : 2 (L) DUP2_X1 INVOKESTATIC Probe.p(Ljava/lang/String;J)V
  • 23. TIMESTAMP FOLDING ● Timestamps are expensive ○ TSC correlated across cores ○ monotonic counter values adjusted for core frequencies ● Minimize number of requested timestamps ○ fold in subsequent calls to System.nanoTime() ● BTrace will optimize timestamps for @Duration parameters Jaroslav Bachorík Prague, 20-21 October 2016
  • 24. INVOCATION SAMPLING ● Instrumented methods are frequently executed ○ injected code causing high overhead ● Short methods experiencing unproportional overhead ● Rely on statistically relevant sample instead ○ execute only on each Nth pass ○ adjust N for acceptable overhead and detail ● Use @Sampled annotation in BTrace ○ fixed N ○ dynamically adjusted N for guaranteed overhead Jaroslav Bachorík Prague, 20-21 October 2016
  • 25. SAMPLING IN BTRACE @BTrace public class ArgsDurationSampled { @OnMethod(clazz="/.*.OnMethodTest/", method="args", location=@Location(value=Kind.RETURN)) @Sampled(kind = Sampled.Sampler.Const, mean = 20) public static void args(@Self Object self, @Return long retVal, @Duration long dur) { println("args"); } } Jaroslav Bachorík Prague, 20-21 October 2016 // Adaptive sampler keeps ‘mean’ nanoseconds between samples in average @Sampled(kind = Sampled.Sampler.Adaptive, mean = 300)
  • 26. INJECTION GUARDS ● Fastest code is the one never executed ● Think of Logger levels ● Class retransformation is costly ● Introducing injection guards ○ injected code executed only when a condition is met ○ minimal overhead when not executing injected code ■ fast field check ● Use @Level annotation in BTrace Jaroslav Bachorík Prague, 20-21 October 2016 @OnMethod(clazz="org.apache.spark.rdd.RDD", method="iterator", enableAt=@Level(">=" + SAMPLING_LEVEL), location=@Location(Kind.RETURN))
  • 27. LESSONS LEARNED Jaroslav Bachorík Prague, 20-21 October 2016
  • 28. HIGH PERFORMANCE INSTRUMENTATION ● Fast filters for identifying injection points ● Minimal and optimized code for injection ○ use timestamps sparsely ○ beware of callbacks from injected code ○ prefer stack manipulation above field retrievals ● Be gentle to GC ● Use sampling when possible ○ getting overhead down ○ still obtaining valid insights ● Enable turning off injection when not needed ○ class retransformation is slow ○ injection guards Jaroslav Bachorík Prague, 20-21 October 2016
  • 29. Resources ● BTrace (https://github.com/btraceio/btrace) ○ Contributors welcomed! ● ASM (http://asm.ow2.org/index.html) ● CGLIB (https://github.com/cglib/cglib) ● ByteBuddy (http://bytebuddy.net/#/) ● JCTools (https://github.com/JCTools/JCTools) Jaroslav Bachorík Prague, 20-21 October 2016
  • 30. Q&A Jaroslav Bachorík Prague, 20-21 October 2016
  • 31. THANK YOU! Jaroslav Bachorík Prague, 20-21 October 2016 j.bachorik@btrace.io, @yardus