SlideShare a Scribd company logo
1 of 109
Download to read offline
Software Profiling
Java Performance, Profiling and
Flamegraphs
M. Isuru Tharanga Chrishantha Perera, Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
Software Profiling
● Profiling can help you to analyze the performance of your applications
and improve poorly performing sections in your code
Software Profiling
Wikipedia definition:
In software engineering, profiling ("program profiling", "software profiling") is a
form of dynamic program analysis that measures, for example, the space
(memory) or time complexity of a program, the usage of particular instructions, or
the frequency and duration of function calls. Most commonly, profiling
information serves to aid program optimization.
https://en.wikipedia.org/wiki/Profiling_(computer_programming)
3
Software Profiling
Wikipedia definition:
Profiling is achieved by instrumenting either the program source code or its
binary executable form using a tool called a profiler (or code profiler). Profilers
may use a number of different techniques, such as event-based, statistical,
instrumented, and simulation methods.
https://en.wikipedia.org/wiki/Profiling_(computer_programming)
4
Measuring Performance of Servers
5
Measuring Performance
6
We need a way to measure the performance:
● To understand how the system behaves
● To see performance improvements after doing any optimizations
There are two key performance metrics.
● Response Time/Latency
● Throughput
Throughput
Throughput measures the number of messages that a server processes
during a specific time interval (e.g. per second).
Throughput is calculated using the equation:
Throughput = number of requests / time to complete the requests
7
Response Time/Latency
Response time is the end-to-end processing time for an operation.
8
Benchmarking Tools
● Apache JMeter
● Gatling
● wrk - HTTP benchmarking tool
● Vegeta - HTTP load testing tool
9
Tuning Java Applications
● We need to have a very high throughput and very low latency values.
● There is a tradeoff between throughput and latency. With more
concurrent users, the throughput increases, but the average latency will
also increase.
● Usually, you need to achieve maximum throughput while keeping latency
within some acceptable limit. For eg: you might choose maximum
throughput in a range where latency is less than 10ms
10
Throughput and Latency Graphs
11
Source: https://www.infoq.com/articles/Tuning-Java-Servers
Response Time/Latency Distribution
When measuring response time, it’s important to look at the the whole
distribution: min, max, avg, median, 75th percentile, 98th percentile, 99th
percentile etc.
12
Longtail latencies
When high percentiles have values much
greater than the average latency
Source:
https://engineering.linkedin.com/performanc
e/who-moved-my-99th-percentile-latency
13
Latency Numbers Every Programmer Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
14
Why do we need Profiling?
● Improve throughput (Maximizing the transactions processed per second)
● Improve latency (Minimizing the time taken to for each operation)
● Find performance bottlenecks
15
Java Garbage Collection
16
Java Garbage Collection
17
● Java automatically allocates memory for our applications and
automatically deallocates memory when certain objects are no longer
used.
● "Automatic Garbage Collection" is an important feature in Java.
● As Java Developers, we don't have to worry about memory
allocations/deallocations as Java takes care of the task to manage
memory for us
Marking and Sweeping Away Garbage
● GC works by first marking all used objects in the heap and then deleting
unused objects.
● GC also compacts the memory after deleting unreferenced objects to
make new memory allocations much easier and faster.
18
GC roots
● JVM references GC roots, which refer the application objects in a tree
structure. There are several kinds of GC Roots in Java.
○ Local Variables
○ Active Java Threads
○ Static variables
○ JNI references
● When the application can reach these GC roots, the whole tree is
reachable and GC can determine which objects are the live objects.
19
Java Heap Structure
Java Heap is divided into generations based on the object lifetime.
Following is the general structure of the Java Heap. (This is mostly dependent
on the type of collector).
20
Young Generation
● Young Generation usually has Eden and Survivor spaces.
● All new objects are allocated in Eden Space.
● When this fills up, a minor GC happens.
● Surviving objects are first moved to survivor spaces.
● When objects survives several minor GCs (tenuring threshold), the
relevant objects are eventually moved to the old generation.
21
Old Generation
● This stores long surviving objects.
● When this fills up, a major GC (full GC) happens.
● A major GC takes a longer time as it has to check all live objects.
22
Permanent Generation
● This has the metadata required by JVM.
● Classes and Methods are stored here.
● This space is included in a full GC.
23
Java 8 and PermGen
● Since Java 8, the permanent generation is not a part of heap.
● The metadata is now moved to native memory to an area called
“Metaspace”
● There is no limit for Metaspace by default
24
"Stop the World"
● For some events, JVM pauses all application threads. These are called
Stop-The-World (STW) pauses.
● GC Events also cause STW pauses.
● We can see application stopped time with GC logs.
25
GC Logging
There are JVM flags to log details for each GC. (Java 7 and 8)
-XX:+PrintGC - Print messages at garbage collection
-XX:+PrintGCDetails - Print more details at garbage collection
-XX:+PrintGCTimeStamps - Print timestamps at garbage collection
-XX:+PrintGCApplicationStoppedTime - Print the application GC stopped time
-XX:+PrintGCApplicationConcurrentTime - Print the application GC concurrent
time
The GCViewer is a great tool to view GC logs
26
Java Memory Usage
● Init - initial amount of memory that the JVM requests from the OS for
memory management during startup.
● Used - amount of memory currently used
● Committed - amount of memory that is guaranteed to be available for use
by the JVM
● Max - maximum amount of memory that can be used for memory
management.
27
Java Tools
28
JDK Tools and Utilities
● Basic Tools (java, javac, jar)
● Security Tools (jarsigner, keytool)
● Java Web Service Tools (wsimport, wsgen)
● Java Troubleshooting, Profiling, Monitoring and Management Tools (jcmd,
jconsole, jmc, jvisualvm)
29
Java Troubleshooting, Profiling, Monitoring and
Management Tools
● jcmd - JVM Diagnostic Commands tool
● jconsole - A JMX-compliant graphical tool for monitoring a Java
application
● jvisualvm – Provides detailed information about the Java application. It
provides CPU & Memory profiling, heap dump analysis, memory leak
detection etc.
● jmc – Tools to monitor and manage Java applications without introducing
performance overhead
30
Java Experimental Tools
Monitoring Tools
● jps – JVM Process Status Tool
● jstat – JVM Statistics Monitoring Tool
Troubleshooting Tools
● jmap - Memory Map for Java
● jhat - Heap Dump Browser
● jstack – Stack Trace for Java
31
Java Ergonomics and JVM Flags
32
Java Ergonomics and JVM Flags
● Java Virtual Machine can tune itself depending on the environment and
this smart tuning is referred to as Ergonomics.
● When tuning Java, it's important to know which values were used as
default for Garbage collector, Heap Sizes, Runtime Compiler by Java
Ergonomics
○ java -XshowSettings:vm -version
33
Printing Command Line Flags
We can use "-XX:+PrintCommandLineFlags" to print the command line flags
used by the JVM.
This is a useful flag to see the values selected by Java Ergonomics.
eg:
$ java -XX:+PrintCommandLineFlags -version
-XX:InitialHeapSize=126516992 -XX:MaxHeapSize=2024271872 -XX:+PrintCommandLineFlags
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
34
Printing Initial & Final JVM Flags
Use following command to see the default values
java -XX:+PrintFlagsInitial -version
Use following command to see the final values.
java -XX:+PrintFlagsFinal -version
The values modified manually or by Java Ergonomics are shown with “:=”
java -XX:+PrintFlagsFinal -version | grep ':='
35
Java Flags
Java has a lot of tuning options:
$ java -XX:+UnlockCommercialFeatures -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions
-XX:+PrintFlagsFinal -version | head -n 10
[Global flags]
uintx AdaptiveSizeDecrementScaleFactor = 4 {product}
uintx AdaptiveSizeMajorGCDecayTimeScale = 10 {product}
uintx AdaptiveSizePausePolicy = 0 {product}
uintx AdaptiveSizePolicyCollectionCostMargin = 50 {product}
uintx AdaptiveSizePolicyInitializingSteps = 20 {product}
uintx AdaptiveSizePolicyOutputInterval = 0 {product}
uintx AdaptiveSizePolicyWeight = 10 {product}
uintx AdaptiveSizeThroughPutPolicy = 0 {product}
uintx AdaptiveTimeWeight = 25 {product}
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
36
Profiling Tools
37
Java Profiling Tools Available in JDK
● Java VisualVM
● Java Mission Control
Other Java Profiling Tools
● JProfiler - A commercially licensed Java profiling tool developed by
ej-technologies
● Honest Profiler - A sampling JVM profiler without the safepoint sample
bias
● Async Profiler - Sampling CPU and HEAP profiler for Java featuring
AsyncGetCallTrace + perf_events
Java Profiling Tools
Survey by RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
Attitude toward performance work
Survey by RebelLabs in 2017:
https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
Measuring Methods for CPU Profiling
● Sampling: Monitor running code externally and check which code is
executed
● Instrumentation: Include measurement code into the real code
Sampling
43
main()
foo()
bar()
Instrumentation
44
main()
foo()
bar()
Sampling vs. Instrumentation
Sampling
Overhead depends on the sampling
interval
Stable Overhead
Can see execution hotspots
Can miss methods, which returns
faster than the sampling interval.
Can discover unknown code
Instrumentation
Precise measurement for execution
times
No stable overhead
More data to process
45
Sampling vs. Instrumentation
46
● Java VisualVM uses both sampling and instrumentation
● Java Flight Recorder uses sampling for hot methods
● JProfiler supports both sampling and instrumentation
How Profilers Work?
● Generic profilers rely on the JVMTI spec
● JVMTI offers only safepoint sampling stack trace collection options
● Some profilers use AsyncGetCallTrace method, which is an OpenJDK
internal API call to facilitate non-safepoint collection of stack traces
Safepoints
● A safepoint is a moment in time when a thread’s data, its internal state
and representation in the JVM are, well, safe for observation by other
threads in the JVM.
○ Between every 2 bytecodes (interpreter mode)
○ Backedge of non-’counted’ loops
○ Method exit
○ JNI call exit
Problems with Profiling
● Runtime Overhead
● Interpretation of the results can be difficult
● Identifying the "crucial“ parts of the software
● Identifying potential performance improvements
49
Profiling Applications with Java VisualVM
50
● CPU Profiling: Profile the performance of the application.
● Memory Profiling: Analyze the memory usage of the application.
Java Mission Control
● A set of powerful tools running on the Oracle JDK to monitor and manage
Java applications
● Free for development use (Oracle Binary Code License)
● Available in JDK since Java 7 update 40
● Supports Plugins
● Two main tools
○ JMX Console
○ Java Flight Recorder
51
Java Flight Recorder (JFR)
52
Java Flight Recorder (JFR)
● A profiling and event collection framework built into the Oracle JDK
● Gather low level information about the JVM and application behaviour
without performance impact (less than 2%)
● Always on Profiling in Production Environments
● Engine was released with Java 7 update 4
● Commercial feature in Oracle JDK
● A main tool in Java Mission Control (since Java 7 update 40)
JFR Events
JFR collects data about events.
JFR collects information about three types of events:
1. Instant events – Events occurring instantly
2. Sample (Requestable) events – Events with a user configurable period to
provide a sample of system activity
3. Duration events – Events taking some time to occur. The event has a start
and end time. You can set a threshold.
54
Java Flight Recorder Architecture
JFR is comprised of the following components:
1. JFR runtime - The recording engine inside the JVM that produces the
recordings.
2. Flight Recorder plugin for Java Mission Control (JMC)
55
Enabling Java Flight Recorder
Since JFR is a commercial feature, we must unlock commercial features before
trying to run JFR.
So, you need to have following arguments.
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
56
Dynamically enabling JFR
If you are using Java 8 update 40 (8u40) or later, you can now dynamically
enable JFR.
This is useful as we don’t need to restart the server.
Sometimes a restart solves the problem anyway. :) But that’s just temporary
and it’s always good to analyze the root cause of the problem.
57
Improving the accuracy of JFR Method Profiler
An important feature of JFR Method Profiler is that it does not require threads
to be at safe points in order for stacks to be sampled.
Generally, the stacks will only be walked at safe points.
HotSpot JVM doesn’t provide metadata for non-safe point parts of the code.
Use following to improve the accuracy.
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
58
JFR Event Settings
There are two event settings by default in Oracle JDK.
Files are in $JAVA_HOME/jre/lib/jfr
1. Continuous - default.jfc
2. Profiling - profile.jfc
59
JFR Recording Types
Time Fixed Recordings
● Fixed duration
● The recording will be opened automatically in JMC at the end (If the
recording was started by JMC)
Continuous Recordings
● No end time
● Must be explicitly dumped
60
Running Java Flight Recorder
There are few ways we can run JFR.
1. Using the JFR plugin in JMC
2. Using the command line
3. Using the Diagnostic Command
61
Running Java Flight Recorder
You can run multiple recordings concurrently and have different settings for
each recording.
However, the JFR runtime will use same buffers and resulting recording
contains the union of all events for all recordings active at that particular time.
This means that we might get more than we asked for. (but not less)
62
Running JFR from JMC
Right click on JVM and select “Start Flight Recording”
Select the type of recording: Time fixed / Continuous
Select the “Event Settings” template
Modify the event options for the selected flight recording template (Optional)
Modify the event details (Optional)
63
Running JFR from Command Line
To produce a Flight Recording from the command line, you can use “-
XX:StartFlightRecording” option. Eg:
-XX:StartFlightRecording=delay=20s,duration=60s,name=Test,fi
lename=recording.jfr,settings=profile
Use following to change log level
-XX:FlightRecorderOptions=loglevel=info
64
The Default Recording (Continuous Recording)
You can also start a continuous recording from the command line using
-XX:FlightRecorderOptions.
-XX:FlightRecorderOptions=defaultrecording=true,disk=true,re
pository=/tmp,maxage=6h,settings=default
Default recording can be dumped on exit. Only the default recording can be
used with the dumponexit and dumponexitpath parameters
-XX:FlightRecorderOptions=defaultrecording=true,dumponexit=t
rue,dumponexitpath=/tmp/dumponexit.jfr
65
Running JFR using Diagnostic Commands
The command “jcmd” can be used.
Start Recording Example:
jcmd <pid> JFR.start delay=20s duration=60s name=MyRecording
filename=/tmp/recording.jfr settings=profile
Check recording
jcmd <pid> JFR.check
Dump Recording
jcmd <pid> JFR.dump filename=/tmp/dump.jfr name=MyRecording
66
Analyzing Flight Recordings
JFR runtime engine dumps recorded data to files with *.jfr extension
These binary files can be viewed from JMC
There are tab groups showing certain aspects of the JVM and the Java
application runtime such as Memory, Threads, I/O etc.
67
JFR Tab Groups
● General – Details of the JVM, the system, and the recording.
● Memory - Information about memory & garbage collection.
● Code - Information about methods, exceptions, compilations, and class
loading.
● Threads - Information about threads and locks.
● I/O: Information about file and socket I/O.
● System: Information about environment
● Events: Information about the event types in the recording
68
Allocation Profiling
● Finding out where the allocations happen in your application.
● If there are more allocations, JVM will have to run garbage collection more
often
69
Sample applications
Let’s try some sample applications
https://github.com/chrishantha/sample-java-programs
● Hot Methods Application
● High CPU Application
● Allocations Application
● Latencies Application
70
Java Just-In-Time (JIT) compiler
71
Java Just-In-Time (JIT) compiler
Java code is usually compiled into platform independent bytecode (class files)
The JVM is able to load the class files and execute the Java bytecode via the
Java interpreter.
Even though this bytecode is usually interpreted, it might also be compiled
into native machine code using the JVM's Just-In-Time (JIT) compiler.
72
Java Just-In-Time (JIT) compiler
Unlike the normal compiler, the JIT compiler compiles the code (bytecode)
only when required. With JIT compiler, the JVM monitors the methods
executed by the interpreter and identifies the “hot methods” for compilation.
After identifying the Java method calls, the JVM compiles the bytecode into a
more efficient native code.
In this way, the JVM can avoid interpreting a method each time during the
execution and thereby improves the runtime performance of the application.
73
JIT Optimization Techniques
● Dead Code Elimination
○ Null Check Elimination
● Branch Prediction
● Loop Unrolling
● Inlining Methods
74
JITWatch
The JITWatch tool can analyze the compilation logs generated with the
“-XX:+LogCompilation” flag.
The logs generated by LogCompilation are XML-based and has lot of
information related to JIT compilation. Hence these files are very large.
https://github.com/AdoptOpenJDK/jitwatch
75
Premature Optimizations
“We should forget about small efficiencies, say
about 97% of the time: premature
optimization is the root of all evil. Yet we
should not pass up our opportunities in that
critical 3%."
- Donald Knuth
76
Image is from: http://wiki.c2.com/?DonKnuth
Premature Optimizations
● You shouldn’t:
○ Manually inline methods.
○ Write code directly in bytecode.
○ Allocate public variables and use them as global memory throughout an application.
77
Flame Graphs
78
Flame Graphs
● “Flame graphs are a visualization of profiled software, allowing the most
frequent code-paths to be identified quickly and accurately.”
● Developed by Brendan Gregg, an industry expert in computing
performance and cloud computing.
● Flame Graphs can be generated using
https://github.com/brendangregg/FlameGraph
○ This creates an interactive SVG
http://www.brendangregg.com/flamegraphs.html
Flame Graph Example
Flame Graph: Definition
● The x-axis shows the stack profile population, sorted alphabetically
● The y-axis shows stack depth
○ The top edge shows what is on-CPU, and beneath it is its ancestry
● Each rectangle represents a stack frame.
● Box width is proportional to the total time a function was profiled directly
or its children were profiled
● The colors are usually not significant, picked randomly to differentiate
frames.
Types of Flame Graphs
● CPU - see which code-paths are hot (busy on-CPU)
● Memory - Memory Leak (and Growth)
● Off-CPU - Time spent by processes and threads when they are not
running on-CPU
● Hot/Cold - both CPU and Off-CPU
● Differential - compare before and after flame graphs
Why do we need Flame Graphs?
● Finding out why CPUs are busy is an important task when troubleshooting
performance issues
● Can use a sampling profiler to see which code-paths are hot.
● Usually a profiler will dump a lot of data with thousands of lines
● Flame Graph can simply visualize the stack traces output of a sampling
profiler.
Naive Profiling: Taking Thread Dumps
● “A thread dump is a snapshot of the state of all threads that are part of
the process.”
● The state of the thread is represented with a stack trace.
● A thread can be in only one state at a given point in time.
● You can take thread dumps at regular intervals to do “Naive Java Profiling”
Sample program to profile
● Get Sample “highcpu” program from
https://github.com/chrishantha/sample-java-programs
● mvn clean install
● cd highcpu
● java -jar target/highcpu.jar --help
Flame Graph with Thread Dumps
i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >>
out.jstacks; sleep 2; done
cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl >
out.stacks-folded
cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl >
jstack_flamegraph.svg
firefox jstack_flamegraph.svg
Flame Graph with Thread Dumps
Flame Graph with Thread Dumps (Without Thread
Names) Top edge shows the methods
on-CPU directly
Visually compare lengths
AncestryCode path
Branches
Flame Graphs with Java Flight Recordings
● We can generate CPU Flame Graphs from a Java Flight Recording
● Program is available at GitHub:
https://github.com/chrishantha/jfr-flame-graph
● The program uses the (unsupported) JMC Parser
Generating a Flame Graph using JFR dump
● JFR has Method Profiling Samples
○ You can view those in “Hot Methods” and “Call Tree” tabs
● A Flame Graph can be generated using these Method Profilings Samples
● Use following to improve the accuracy of JFR Method Profiler.
● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
Profiling the Sample Program
● Get a Profiling Recording
○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
-XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena
me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar
--hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
Profiling a Sample Program
Get Sample “highcpu” program from
https://github.com/chrishantha/sample-java-programs
Get a Profiling Recording
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
-XX:StartFlightRecording=delay=5s,duration=1m,name=Profiling,filename=highcp
u_profiling.jfr,settings=profile -jar target/highcpu.jar
Using jfr-flame-graph
create_flamegraph.sh -f highcpu_profiling.jfr -i > flamegraph.svg
92
Tree View (in JFR)
Using jfr-flame-graph
create_flamegraph.sh -f highcpu_profiling.jfr -i > jfr_flamegraph.svg
Java Mixed-Mode Flame Graphs
● With Java Profilers, we can get information about Java process only.
● However with Java Mixed-Mode Flame Graphs, we can see how much CPU
time is spent in Java methods, system libraries and the kernel.
● Mixed-mode means that the Flame Graph shows profile information from
both system code paths and Java code paths.
Linux Perf (perf_events)
● System profiler
● Userspace + Kernel
Installing “perf_events” on Ubuntu
● On terminal, type perf
● sudo apt install linux-tools-common
● sudo apt install linux-tools-generic
The Problem with Java and Perf
● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by
default.
● Run sample program
○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers
10 --exit-timeout 300
● Run perf record
○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60
● Display trace output
○ sudo perf script
No Java Frames!
Preserving Frame Pointers in JVM
● Run java program with the JVM flag "-XX:+PreserveFramePointer"
○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512
--hashing-workers 20 --math-workers 10 --exit-timeout 300
● This flag is working only on JDK 8 update 60 and above.
● Some frames may be still missing when compared to Flame Graphs
generated from JFR or jstack due to “inlining”.
● Can reduced the amount of inlining if you need to see more frames in the
profile.
○ For example, -XX:InlineSmallCode=500
Preserving Frame Pointers in JVM
Run java program with the JVM flag "-XX:+PreserveFramePointer"
java -XX:+PreserveFramePointer -jar target/highcpu.jar
--exit-timeout 600
This flag is working only on JDK 8 update 60 and above.
101
How to generate Java symbol table
● Use a java agent to generate method mappings to use with the linux
`perf` tool
○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent
● Create symbol map
○ ./create-java-perf-map.sh `pgrep -f highcpu`
● You can also use “jmaps” tool in FlameGraph repository to create symbol
files for all Java processes.
○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent
○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps
● Let Java to “warm-up” before getting symbol maps.
Generate Java Mixed-Mode Flame Graph
● Run perf and create symbol map
○ export
AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent
○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E
$FLAMEGRAPH_DIR/jmaps
● Generate Flame Graph
○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | 
○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | 
○ flamegraph.pl --color=java --hash --width 1080 >
java-mixed-mode.svg
○ firefox java-mixed-mode.svg
Java Mixed-Mode Flame Graph
Java Mixed-Mode Flame Graph for Netty
Java Mixed-Mode Flame Graph
● Helps to understand Java CPU Usage
● With Flame Graphs, we can see both java and system profiles
● Can profile GC as well
Linux Profiling
We can use “perf”, which is a Linux Profiler with performance counters to
profile system code paths.
Linux perf command is also called perf_events
Some perf commands:
perf stat: obtain event counts
perf record: record events for later reporting
perf report: break down events by process, function, etc.
perf top: see live event count
107
Does profiling matter?
● Yes!
● Most of the performance issues are in the application code.
● Early performance testing is key. Fix problems while developing.
108
Thank you!
109

More Related Content

What's hot

java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gcexsuns
 
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia "What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia Vladimir Ivanov
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMKris Mok
 
JMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialJMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialMiro Wengner
 
Java Colombo Meetup: Java Mission Control & Java Flight Recorder
Java Colombo Meetup: Java Mission Control & Java Flight RecorderJava Colombo Meetup: Java Mission Control & Java Flight Recorder
Java Colombo Meetup: Java Mission Control & Java Flight RecorderIsuru Perera
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time JavaDeniz Oguz
 
Analyzing Java Applications Using Thermostat (Omair Majid)
Analyzing Java Applications Using Thermostat (Omair Majid)Analyzing Java Applications Using Thermostat (Omair Majid)
Analyzing Java Applications Using Thermostat (Omair Majid)Red Hat Developers
 
Java Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesJava Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesStaffan Larsen
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance TuningMinh Hoang
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuningJerry Kurian
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
 
jcmd #javacasual
jcmd #javacasualjcmd #javacasual
jcmd #javacasualYuji Kubota
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection TuningKai Koenig
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsGreg Banks
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomFlame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomValeriy Kravchuk
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsXiaozhe Wang
 

What's hot (20)

java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gc
 
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia "What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia
"What's New in HotSpot JVM 8" @ JPoint 2014, Moscow, Russia
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
JMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialJMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezial
 
Java Colombo Meetup: Java Mission Control & Java Flight Recorder
Java Colombo Meetup: Java Mission Control & Java Flight RecorderJava Colombo Meetup: Java Mission Control & Java Flight Recorder
Java Colombo Meetup: Java Mission Control & Java Flight Recorder
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
 
Analyzing Java Applications Using Thermostat (Omair Majid)
Analyzing Java Applications Using Thermostat (Omair Majid)Analyzing Java Applications Using Thermostat (Omair Majid)
Analyzing Java Applications Using Thermostat (Omair Majid)
 
Java Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesJava Flight Recorder Behind the Scenes
Java Flight Recorder Behind the Scenes
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
 
jcmd #javacasual
jcmd #javacasualjcmd #javacasual
jcmd #javacasual
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection Tuning
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programs
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomFlame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
 

Similar to Software Profiling: Java Performance, Profiling and Flamegraphs

Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and ProfilingWSO2
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020Jelastic Multi-Cloud PaaS
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvmPrem Kuppumani
 
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...Jelastic Multi-Cloud PaaS
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Anna Shymchenko
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageJelastic Multi-Cloud PaaS
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnosticsDanijel Mitar
 
It's always sunny with OpenJ9
It's always sunny with OpenJ9It's always sunny with OpenJ9
It's always sunny with OpenJ9DanHeidinga
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Dinakar Guniguntala
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK toolsHaribabu Nandyal Padmanaban
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and CassandraChris Lohfink
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first designKyrylo Reznykov
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance TunningTerry Cho
 

Similar to Software Profiling: Java Performance, Profiling and Flamegraphs (20)

Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and Profiling
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnostics
 
It's always sunny with OpenJ9
It's always sunny with OpenJ9It's always sunny with OpenJ9
It's always sunny with OpenJ9
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Realtime
RealtimeRealtime
Realtime
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 

Recently uploaded

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 

Recently uploaded (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 

Software Profiling: Java Performance, Profiling and Flamegraphs

  • 1. Software Profiling Java Performance, Profiling and Flamegraphs M. Isuru Tharanga Chrishantha Perera, Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
  • 2. Software Profiling ● Profiling can help you to analyze the performance of your applications and improve poorly performing sections in your code
  • 3. Software Profiling Wikipedia definition: In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization. https://en.wikipedia.org/wiki/Profiling_(computer_programming) 3
  • 4. Software Profiling Wikipedia definition: Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods. https://en.wikipedia.org/wiki/Profiling_(computer_programming) 4
  • 6. Measuring Performance 6 We need a way to measure the performance: ● To understand how the system behaves ● To see performance improvements after doing any optimizations There are two key performance metrics. ● Response Time/Latency ● Throughput
  • 7. Throughput Throughput measures the number of messages that a server processes during a specific time interval (e.g. per second). Throughput is calculated using the equation: Throughput = number of requests / time to complete the requests 7
  • 8. Response Time/Latency Response time is the end-to-end processing time for an operation. 8
  • 9. Benchmarking Tools ● Apache JMeter ● Gatling ● wrk - HTTP benchmarking tool ● Vegeta - HTTP load testing tool 9
  • 10. Tuning Java Applications ● We need to have a very high throughput and very low latency values. ● There is a tradeoff between throughput and latency. With more concurrent users, the throughput increases, but the average latency will also increase. ● Usually, you need to achieve maximum throughput while keeping latency within some acceptable limit. For eg: you might choose maximum throughput in a range where latency is less than 10ms 10
  • 11. Throughput and Latency Graphs 11 Source: https://www.infoq.com/articles/Tuning-Java-Servers
  • 12. Response Time/Latency Distribution When measuring response time, it’s important to look at the the whole distribution: min, max, avg, median, 75th percentile, 98th percentile, 99th percentile etc. 12
  • 13. Longtail latencies When high percentiles have values much greater than the average latency Source: https://engineering.linkedin.com/performanc e/who-moved-my-99th-percentile-latency 13
  • 14. Latency Numbers Every Programmer Should Know L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms 14
  • 15. Why do we need Profiling? ● Improve throughput (Maximizing the transactions processed per second) ● Improve latency (Minimizing the time taken to for each operation) ● Find performance bottlenecks 15
  • 17. Java Garbage Collection 17 ● Java automatically allocates memory for our applications and automatically deallocates memory when certain objects are no longer used. ● "Automatic Garbage Collection" is an important feature in Java. ● As Java Developers, we don't have to worry about memory allocations/deallocations as Java takes care of the task to manage memory for us
  • 18. Marking and Sweeping Away Garbage ● GC works by first marking all used objects in the heap and then deleting unused objects. ● GC also compacts the memory after deleting unreferenced objects to make new memory allocations much easier and faster. 18
  • 19. GC roots ● JVM references GC roots, which refer the application objects in a tree structure. There are several kinds of GC Roots in Java. ○ Local Variables ○ Active Java Threads ○ Static variables ○ JNI references ● When the application can reach these GC roots, the whole tree is reachable and GC can determine which objects are the live objects. 19
  • 20. Java Heap Structure Java Heap is divided into generations based on the object lifetime. Following is the general structure of the Java Heap. (This is mostly dependent on the type of collector). 20
  • 21. Young Generation ● Young Generation usually has Eden and Survivor spaces. ● All new objects are allocated in Eden Space. ● When this fills up, a minor GC happens. ● Surviving objects are first moved to survivor spaces. ● When objects survives several minor GCs (tenuring threshold), the relevant objects are eventually moved to the old generation. 21
  • 22. Old Generation ● This stores long surviving objects. ● When this fills up, a major GC (full GC) happens. ● A major GC takes a longer time as it has to check all live objects. 22
  • 23. Permanent Generation ● This has the metadata required by JVM. ● Classes and Methods are stored here. ● This space is included in a full GC. 23
  • 24. Java 8 and PermGen ● Since Java 8, the permanent generation is not a part of heap. ● The metadata is now moved to native memory to an area called “Metaspace” ● There is no limit for Metaspace by default 24
  • 25. "Stop the World" ● For some events, JVM pauses all application threads. These are called Stop-The-World (STW) pauses. ● GC Events also cause STW pauses. ● We can see application stopped time with GC logs. 25
  • 26. GC Logging There are JVM flags to log details for each GC. (Java 7 and 8) -XX:+PrintGC - Print messages at garbage collection -XX:+PrintGCDetails - Print more details at garbage collection -XX:+PrintGCTimeStamps - Print timestamps at garbage collection -XX:+PrintGCApplicationStoppedTime - Print the application GC stopped time -XX:+PrintGCApplicationConcurrentTime - Print the application GC concurrent time The GCViewer is a great tool to view GC logs 26
  • 27. Java Memory Usage ● Init - initial amount of memory that the JVM requests from the OS for memory management during startup. ● Used - amount of memory currently used ● Committed - amount of memory that is guaranteed to be available for use by the JVM ● Max - maximum amount of memory that can be used for memory management. 27
  • 29. JDK Tools and Utilities ● Basic Tools (java, javac, jar) ● Security Tools (jarsigner, keytool) ● Java Web Service Tools (wsimport, wsgen) ● Java Troubleshooting, Profiling, Monitoring and Management Tools (jcmd, jconsole, jmc, jvisualvm) 29
  • 30. Java Troubleshooting, Profiling, Monitoring and Management Tools ● jcmd - JVM Diagnostic Commands tool ● jconsole - A JMX-compliant graphical tool for monitoring a Java application ● jvisualvm – Provides detailed information about the Java application. It provides CPU & Memory profiling, heap dump analysis, memory leak detection etc. ● jmc – Tools to monitor and manage Java applications without introducing performance overhead 30
  • 31. Java Experimental Tools Monitoring Tools ● jps – JVM Process Status Tool ● jstat – JVM Statistics Monitoring Tool Troubleshooting Tools ● jmap - Memory Map for Java ● jhat - Heap Dump Browser ● jstack – Stack Trace for Java 31
  • 32. Java Ergonomics and JVM Flags 32
  • 33. Java Ergonomics and JVM Flags ● Java Virtual Machine can tune itself depending on the environment and this smart tuning is referred to as Ergonomics. ● When tuning Java, it's important to know which values were used as default for Garbage collector, Heap Sizes, Runtime Compiler by Java Ergonomics ○ java -XshowSettings:vm -version 33
  • 34. Printing Command Line Flags We can use "-XX:+PrintCommandLineFlags" to print the command line flags used by the JVM. This is a useful flag to see the values selected by Java Ergonomics. eg: $ java -XX:+PrintCommandLineFlags -version -XX:InitialHeapSize=126516992 -XX:MaxHeapSize=2024271872 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode) 34
  • 35. Printing Initial & Final JVM Flags Use following command to see the default values java -XX:+PrintFlagsInitial -version Use following command to see the final values. java -XX:+PrintFlagsFinal -version The values modified manually or by Java Ergonomics are shown with “:=” java -XX:+PrintFlagsFinal -version | grep ':=' 35
  • 36. Java Flags Java has a lot of tuning options: $ java -XX:+UnlockCommercialFeatures -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version | head -n 10 [Global flags] uintx AdaptiveSizeDecrementScaleFactor = 4 {product} uintx AdaptiveSizeMajorGCDecayTimeScale = 10 {product} uintx AdaptiveSizePausePolicy = 0 {product} uintx AdaptiveSizePolicyCollectionCostMargin = 50 {product} uintx AdaptiveSizePolicyInitializingSteps = 20 {product} uintx AdaptiveSizePolicyOutputInterval = 0 {product} uintx AdaptiveSizePolicyWeight = 10 {product} uintx AdaptiveSizeThroughPutPolicy = 0 {product} uintx AdaptiveTimeWeight = 25 {product} java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode) 36
  • 38. Java Profiling Tools Available in JDK ● Java VisualVM ● Java Mission Control
  • 39. Other Java Profiling Tools ● JProfiler - A commercially licensed Java profiling tool developed by ej-technologies ● Honest Profiler - A sampling JVM profiler without the safepoint sample bias ● Async Profiler - Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
  • 40. Java Profiling Tools Survey by RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
  • 41. Attitude toward performance work Survey by RebelLabs in 2017: https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
  • 42. Measuring Methods for CPU Profiling ● Sampling: Monitor running code externally and check which code is executed ● Instrumentation: Include measurement code into the real code
  • 45. Sampling vs. Instrumentation Sampling Overhead depends on the sampling interval Stable Overhead Can see execution hotspots Can miss methods, which returns faster than the sampling interval. Can discover unknown code Instrumentation Precise measurement for execution times No stable overhead More data to process 45
  • 46. Sampling vs. Instrumentation 46 ● Java VisualVM uses both sampling and instrumentation ● Java Flight Recorder uses sampling for hot methods ● JProfiler supports both sampling and instrumentation
  • 47. How Profilers Work? ● Generic profilers rely on the JVMTI spec ● JVMTI offers only safepoint sampling stack trace collection options ● Some profilers use AsyncGetCallTrace method, which is an OpenJDK internal API call to facilitate non-safepoint collection of stack traces
  • 48. Safepoints ● A safepoint is a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM. ○ Between every 2 bytecodes (interpreter mode) ○ Backedge of non-’counted’ loops ○ Method exit ○ JNI call exit
  • 49. Problems with Profiling ● Runtime Overhead ● Interpretation of the results can be difficult ● Identifying the "crucial“ parts of the software ● Identifying potential performance improvements 49
  • 50. Profiling Applications with Java VisualVM 50 ● CPU Profiling: Profile the performance of the application. ● Memory Profiling: Analyze the memory usage of the application.
  • 51. Java Mission Control ● A set of powerful tools running on the Oracle JDK to monitor and manage Java applications ● Free for development use (Oracle Binary Code License) ● Available in JDK since Java 7 update 40 ● Supports Plugins ● Two main tools ○ JMX Console ○ Java Flight Recorder 51
  • 53. Java Flight Recorder (JFR) ● A profiling and event collection framework built into the Oracle JDK ● Gather low level information about the JVM and application behaviour without performance impact (less than 2%) ● Always on Profiling in Production Environments ● Engine was released with Java 7 update 4 ● Commercial feature in Oracle JDK ● A main tool in Java Mission Control (since Java 7 update 40)
  • 54. JFR Events JFR collects data about events. JFR collects information about three types of events: 1. Instant events – Events occurring instantly 2. Sample (Requestable) events – Events with a user configurable period to provide a sample of system activity 3. Duration events – Events taking some time to occur. The event has a start and end time. You can set a threshold. 54
  • 55. Java Flight Recorder Architecture JFR is comprised of the following components: 1. JFR runtime - The recording engine inside the JVM that produces the recordings. 2. Flight Recorder plugin for Java Mission Control (JMC) 55
  • 56. Enabling Java Flight Recorder Since JFR is a commercial feature, we must unlock commercial features before trying to run JFR. So, you need to have following arguments. -XX:+UnlockCommercialFeatures -XX:+FlightRecorder 56
  • 57. Dynamically enabling JFR If you are using Java 8 update 40 (8u40) or later, you can now dynamically enable JFR. This is useful as we don’t need to restart the server. Sometimes a restart solves the problem anyway. :) But that’s just temporary and it’s always good to analyze the root cause of the problem. 57
  • 58. Improving the accuracy of JFR Method Profiler An important feature of JFR Method Profiler is that it does not require threads to be at safe points in order for stacks to be sampled. Generally, the stacks will only be walked at safe points. HotSpot JVM doesn’t provide metadata for non-safe point parts of the code. Use following to improve the accuracy. -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 58
  • 59. JFR Event Settings There are two event settings by default in Oracle JDK. Files are in $JAVA_HOME/jre/lib/jfr 1. Continuous - default.jfc 2. Profiling - profile.jfc 59
  • 60. JFR Recording Types Time Fixed Recordings ● Fixed duration ● The recording will be opened automatically in JMC at the end (If the recording was started by JMC) Continuous Recordings ● No end time ● Must be explicitly dumped 60
  • 61. Running Java Flight Recorder There are few ways we can run JFR. 1. Using the JFR plugin in JMC 2. Using the command line 3. Using the Diagnostic Command 61
  • 62. Running Java Flight Recorder You can run multiple recordings concurrently and have different settings for each recording. However, the JFR runtime will use same buffers and resulting recording contains the union of all events for all recordings active at that particular time. This means that we might get more than we asked for. (but not less) 62
  • 63. Running JFR from JMC Right click on JVM and select “Start Flight Recording” Select the type of recording: Time fixed / Continuous Select the “Event Settings” template Modify the event options for the selected flight recording template (Optional) Modify the event details (Optional) 63
  • 64. Running JFR from Command Line To produce a Flight Recording from the command line, you can use “- XX:StartFlightRecording” option. Eg: -XX:StartFlightRecording=delay=20s,duration=60s,name=Test,fi lename=recording.jfr,settings=profile Use following to change log level -XX:FlightRecorderOptions=loglevel=info 64
  • 65. The Default Recording (Continuous Recording) You can also start a continuous recording from the command line using -XX:FlightRecorderOptions. -XX:FlightRecorderOptions=defaultrecording=true,disk=true,re pository=/tmp,maxage=6h,settings=default Default recording can be dumped on exit. Only the default recording can be used with the dumponexit and dumponexitpath parameters -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=t rue,dumponexitpath=/tmp/dumponexit.jfr 65
  • 66. Running JFR using Diagnostic Commands The command “jcmd” can be used. Start Recording Example: jcmd <pid> JFR.start delay=20s duration=60s name=MyRecording filename=/tmp/recording.jfr settings=profile Check recording jcmd <pid> JFR.check Dump Recording jcmd <pid> JFR.dump filename=/tmp/dump.jfr name=MyRecording 66
  • 67. Analyzing Flight Recordings JFR runtime engine dumps recorded data to files with *.jfr extension These binary files can be viewed from JMC There are tab groups showing certain aspects of the JVM and the Java application runtime such as Memory, Threads, I/O etc. 67
  • 68. JFR Tab Groups ● General – Details of the JVM, the system, and the recording. ● Memory - Information about memory & garbage collection. ● Code - Information about methods, exceptions, compilations, and class loading. ● Threads - Information about threads and locks. ● I/O: Information about file and socket I/O. ● System: Information about environment ● Events: Information about the event types in the recording 68
  • 69. Allocation Profiling ● Finding out where the allocations happen in your application. ● If there are more allocations, JVM will have to run garbage collection more often 69
  • 70. Sample applications Let’s try some sample applications https://github.com/chrishantha/sample-java-programs ● Hot Methods Application ● High CPU Application ● Allocations Application ● Latencies Application 70
  • 71. Java Just-In-Time (JIT) compiler 71
  • 72. Java Just-In-Time (JIT) compiler Java code is usually compiled into platform independent bytecode (class files) The JVM is able to load the class files and execute the Java bytecode via the Java interpreter. Even though this bytecode is usually interpreted, it might also be compiled into native machine code using the JVM's Just-In-Time (JIT) compiler. 72
  • 73. Java Just-In-Time (JIT) compiler Unlike the normal compiler, the JIT compiler compiles the code (bytecode) only when required. With JIT compiler, the JVM monitors the methods executed by the interpreter and identifies the “hot methods” for compilation. After identifying the Java method calls, the JVM compiles the bytecode into a more efficient native code. In this way, the JVM can avoid interpreting a method each time during the execution and thereby improves the runtime performance of the application. 73
  • 74. JIT Optimization Techniques ● Dead Code Elimination ○ Null Check Elimination ● Branch Prediction ● Loop Unrolling ● Inlining Methods 74
  • 75. JITWatch The JITWatch tool can analyze the compilation logs generated with the “-XX:+LogCompilation” flag. The logs generated by LogCompilation are XML-based and has lot of information related to JIT compilation. Hence these files are very large. https://github.com/AdoptOpenJDK/jitwatch 75
  • 76. Premature Optimizations “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth 76 Image is from: http://wiki.c2.com/?DonKnuth
  • 77. Premature Optimizations ● You shouldn’t: ○ Manually inline methods. ○ Write code directly in bytecode. ○ Allocate public variables and use them as global memory throughout an application. 77
  • 79. Flame Graphs ● “Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.” ● Developed by Brendan Gregg, an industry expert in computing performance and cloud computing. ● Flame Graphs can be generated using https://github.com/brendangregg/FlameGraph ○ This creates an interactive SVG http://www.brendangregg.com/flamegraphs.html
  • 81. Flame Graph: Definition ● The x-axis shows the stack profile population, sorted alphabetically ● The y-axis shows stack depth ○ The top edge shows what is on-CPU, and beneath it is its ancestry ● Each rectangle represents a stack frame. ● Box width is proportional to the total time a function was profiled directly or its children were profiled ● The colors are usually not significant, picked randomly to differentiate frames.
  • 82. Types of Flame Graphs ● CPU - see which code-paths are hot (busy on-CPU) ● Memory - Memory Leak (and Growth) ● Off-CPU - Time spent by processes and threads when they are not running on-CPU ● Hot/Cold - both CPU and Off-CPU ● Differential - compare before and after flame graphs
  • 83. Why do we need Flame Graphs? ● Finding out why CPUs are busy is an important task when troubleshooting performance issues ● Can use a sampling profiler to see which code-paths are hot. ● Usually a profiler will dump a lot of data with thousands of lines ● Flame Graph can simply visualize the stack traces output of a sampling profiler.
  • 84. Naive Profiling: Taking Thread Dumps ● “A thread dump is a snapshot of the state of all threads that are part of the process.” ● The state of the thread is represented with a stack trace. ● A thread can be in only one state at a given point in time. ● You can take thread dumps at regular intervals to do “Naive Java Profiling”
  • 85. Sample program to profile ● Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programs ● mvn clean install ● cd highcpu ● java -jar target/highcpu.jar --help
  • 86. Flame Graph with Thread Dumps i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >> out.jstacks; sleep 2; done cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl > out.stacks-folded cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl > jstack_flamegraph.svg firefox jstack_flamegraph.svg
  • 87. Flame Graph with Thread Dumps
  • 88. Flame Graph with Thread Dumps (Without Thread Names) Top edge shows the methods on-CPU directly Visually compare lengths AncestryCode path Branches
  • 89. Flame Graphs with Java Flight Recordings ● We can generate CPU Flame Graphs from a Java Flight Recording ● Program is available at GitHub: https://github.com/chrishantha/jfr-flame-graph ● The program uses the (unsupported) JMC Parser
  • 90. Generating a Flame Graph using JFR dump ● JFR has Method Profiling Samples ○ You can view those in “Hot Methods” and “Call Tree” tabs ● A Flame Graph can be generated using these Method Profilings Samples ● Use following to improve the accuracy of JFR Method Profiler. ● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
  • 91. Profiling the Sample Program ● Get a Profiling Recording ○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
  • 92. Profiling a Sample Program Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programs Get a Profiling Recording java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=5s,duration=1m,name=Profiling,filename=highcp u_profiling.jfr,settings=profile -jar target/highcpu.jar Using jfr-flame-graph create_flamegraph.sh -f highcpu_profiling.jfr -i > flamegraph.svg 92
  • 94. Using jfr-flame-graph create_flamegraph.sh -f highcpu_profiling.jfr -i > jfr_flamegraph.svg
  • 95. Java Mixed-Mode Flame Graphs ● With Java Profilers, we can get information about Java process only. ● However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel. ● Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.
  • 96. Linux Perf (perf_events) ● System profiler ● Userspace + Kernel
  • 97. Installing “perf_events” on Ubuntu ● On terminal, type perf ● sudo apt install linux-tools-common ● sudo apt install linux-tools-generic
  • 98. The Problem with Java and Perf ● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by default. ● Run sample program ○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● Run perf record ○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60 ● Display trace output ○ sudo perf script
  • 100. Preserving Frame Pointers in JVM ● Run java program with the JVM flag "-XX:+PreserveFramePointer" ○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● This flag is working only on JDK 8 update 60 and above. ● Some frames may be still missing when compared to Flame Graphs generated from JFR or jstack due to “inlining”. ● Can reduced the amount of inlining if you need to see more frames in the profile. ○ For example, -XX:InlineSmallCode=500
  • 101. Preserving Frame Pointers in JVM Run java program with the JVM flag "-XX:+PreserveFramePointer" java -XX:+PreserveFramePointer -jar target/highcpu.jar --exit-timeout 600 This flag is working only on JDK 8 update 60 and above. 101
  • 102. How to generate Java symbol table ● Use a java agent to generate method mappings to use with the linux `perf` tool ○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent ● Create symbol map ○ ./create-java-perf-map.sh `pgrep -f highcpu` ● You can also use “jmaps” tool in FlameGraph repository to create symbol files for all Java processes. ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Let Java to “warm-up” before getting symbol maps.
  • 103. Generate Java Mixed-Mode Flame Graph ● Run perf and create symbol map ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Generate Flame Graph ○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | ○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | ○ flamegraph.pl --color=java --hash --width 1080 > java-mixed-mode.svg ○ firefox java-mixed-mode.svg
  • 105. Java Mixed-Mode Flame Graph for Netty
  • 106. Java Mixed-Mode Flame Graph ● Helps to understand Java CPU Usage ● With Flame Graphs, we can see both java and system profiles ● Can profile GC as well
  • 107. Linux Profiling We can use “perf”, which is a Linux Profiler with performance counters to profile system code paths. Linux perf command is also called perf_events Some perf commands: perf stat: obtain event counts perf record: record events for later reporting perf report: break down events by process, function, etc. perf top: see live event count 107
  • 108. Does profiling matter? ● Yes! ● Most of the performance issues are in the application code. ● Early performance testing is key. Fix problems while developing. 108