The biggest challenge in performance tuning is identifying the root cause of the bottleneck. Once you find it, the fix often becomes trivial. However, this detective work takes patience, skills, and effort, so we often attempt to guess the cause, by trying out tentative fixes. The result: messy code, waste of time and money, and frustration. During this talk you will learn how to correctly zoom in on the bottleneck using three levels of profiling: distributed tracing with Zipkin, metrics with Micrometer, and profiling with the Java Flight Recorder already built into your JVM. We’ll focus on the latter and learn how to read a flame graph to trace some common issues of backend systems like connection/thread pool starvation, time-consuming aspects, hot methods, and lock contention, even if these occur in library code you did not write.
1. Profiling your Java Applica2on
@victorrentea ♦ VictorRentea.ro
Code: github.com/victorrentea/performance-profiling.git
2. 👉 victorrentea.ro/training-offer
👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS)
Java Champion, 18 years of code, 10 years of teaching
Consultant & Trainer for 100+ companies:
❤ Clean Code, Architecture, Unit Tes3ng
🛠 Spring, Hibernate, Reac3ve/WebFlux
⚡ Java Performance, Secure Coding 🔐
Conference Talks on YouTube
Founder of European So<ware Cra<ers Community, > 6000 members
🔥 Free 1-hour webinars, a;er work, on clean code, design, tes>ng 👉 victorrentea.ro/community
Past events on youtube.com/vrentea
Father of 👧👦, servant of a 🐈, weekend gardener 🌼
@victorrentea
VictorRentea.ro
3. 3 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
The response *me of your endpoint is too slow 😱
🎉 Congratulations! 🎉
You made it to production, and you're taking serious load!
Where to start?
4. 4 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Log timestamps
log.info("Before call"); !// 2023-05-23 00:01:50.000 !!...
!// suspect code
log.info("After call"); !// 2023-05-23 00:01:53.100 !!...
Measuring Method Run*me – Old-School Style
t1-t0
long t0 = currentTimeMillis();
List<CommentDto> comments = client.fetchComments(loanId); // suspect code
long t1 = currentTimeMillis();
System.out.println("Took millis: " + (t1 - t0));
5. 5 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Micrometer
@Timed !// wraps the method with an aspect that monitors spent time
List<CommentDto> fetchComments(Long id) {
!// suspect code
}
then, in a Spring application
GET /actuator/prometheus returns:
method_timed_seconds_sum{method="fetchComments",} 100.0
method_timed_seconds_count{method="fetchComments",} 400
è avg = 0.25 s/call
sampled again after 5 seconds:
method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s
method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls
è avg = 50 s / 100 = 0.5 s/call (spike)
time
average[ms]
250
500
6. 6 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Micrometer API
https://www.baeldung.com/micrometer
!// programatic alternative to @Timed
meterRegistry.timer("name").record(() -> { !/*suspect code!*/ });
meterRegistry.timer("name").record(t1 - t0, MILLISECONDS);
!// accumulator
meterRegistry.counter("earnings").increment(purchase);
!// current value
meterRegistry.gauge("player-count", currentPlayers.size());
!// eg. "95% requests took ≤ 100 ms"
@Timed(percentiles = {0.95}, histogram = true)
7. 7 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§Micrometer records metrics
a) your own
b) key framework metrics for common issues
c) opBonal metrics about Tomcat, executors, ...
§The metrics are regularly polled/pushed to other tools for storage
- eg. Prometheus
§... and later used by (eg.) Grafana to:
- Display impressive charts
- Aggregate metrics
- Raise alerts
- ...
Some examples è
Observing a System with Micrometer
8. 8 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§ hBp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04
hBp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0
§ tomcat_connec>ons_current_connec>ons{...} 298 ⚠ 98 requests wai>ng in queue
tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starva>on?
§ jvm_gc_memory_allocated_bytes_total 1.8874368E9
jvm_memory_usage_a;er_gc_percent{...} 45.15 ⚠ >30% => increase max heap?
§ hikaricp_connec>ons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starva>on?
hikaricp_connec>ons_acquire_seconds_count{pool="HikariPool-1",} 551.0
§ cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% cache miss = bad config issue?
cache_gets_total{cache="cache1",result="hit",} 0.01
§ logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱
§ my_own_custom_metric xyz
Some Metrics
available at /actuator/prometheus
9. 9 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
?
Monitoring a single system is easy!
10. 10 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Compared to a Distributed Ecosystem
1st challenge: trace requests
A B C D E
Monitoring a single system is easy!
11. 11 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§TraceID header is read from the incoming HTTP request/MQ message
- If missing, it is generated by first system in the call chain
§TraceID is propagated inside the app:
- Methods called (via ThreadLocal)
- Any logs printed (via Logback MDC) – cri>cal for distributed log analysis
- Upstream operators in a Reactor chain (via ReactorContext in WebFlux)
- Async methods submiBed to a DI-managed *TaskExecutor (via a TaskDecorator)
§TraceID header is sent in any outgoing request/message
- ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ...
Distributed Request Tracing
h:ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on
A B
<header> TraceId
12. 12 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§The TraceId is available for any processing we do
§Every system reports the span of :me spent working on a TraceId
- For performance, a frac>on of spans are reported (eg 1%)
§... to a central system that builds a :me chart of all systems
example è
Distributed Request Sampling
13. 13 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Distributed Request Sampling with Zipkin
14. 14 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Distributed Request Sampling with Zipkin
15. 15 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§Repeated API calls, by accident 😱
§API calls in a loop, should be fetched in batches
- GET /item/1, GET /item/3, ....
§Long chains of REST calls
- A à B à C à D à E
§System-in-the-middle
§Independent API calls than can be parallelized
§Long spans not waiAng for other systems => zoom in 🔎
Distributed Processing An*-PaKerns ⚠
16. 16 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Profiling
Blackbox: inter-system communica/on
(Zipkin, ...)
Graybox: report key framework metrics
(Micrometer)
Whitebox: profile methods inside JVM
(Java Flight Recorderè)
app
lib-a lib-b
methodA() methodB()
17. 17 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
How to find methods that take a long 1me?
💡
Naive idea: Instrument your code with a -javaagent
that measures the execuPon of every single executed Java method
Adds a huge CPU+MEMORY overhead
18. 18 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Java Flight Recorder
The Tools of the Java Gods
19. 19 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Java Flight Recorder (JFR)
Main features:
§ Capture stack traces of running threads every second
§ Record internal JVM events about locks, files, sockets, GC ac>vity, ... or your own
§ Usable in prod: op>mized to add x-low overhead (<2%)
§ Free since Java 11
Usage:
§ Local JVM using Java Mission Control or IntelliJ ⬆
§ Remote JVM => generate a .jfr file => download => analyze
First, enable JFR via -XX:+FlightRecorder, then start the recording:
- Con3nuous from startup: -XX:StartFlightRecording=....
- Via command line (SSH): jcmd <JAVAPID> JFR.start
- Via endpoints JMX or Actuator
- Via tools, eg Glowroot in our experiment
20. 26 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Flame Graph
best way to visualize the output of an execuAon profiler
21. 27 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method)
Flame.f(Flame.java:21)
Flame.entry(Flame.java:17)
Flame.dummy(Flame.java:14)
Thread.sleep(Native Method)
Flame.g(Flame.java:24)
Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
2 samples (33%)
4 samples (66%)
JFR samples the stack traces of running threads every second
22. 28 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method)
Flame.g(Flame.java:24)
Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
Thread.sleep(Native Method)
Flame.f(Flame.java:21)
Flame.entry(Flame.java:17)
Flame.dummy(Flame.java:14)
2 samples (33%) 4 samples (66%)
Length of each bar is propor?onal to the number of samples (?me)
Building a Flame Graph
JFR samples the stack traces of running threads every second
Flame.dummy(Flame.java:14)
23. 29 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Thread.sleep(Native Method) Thread.sleep(Native Method)
Flame.f(Flame.java:21) Flame.g(Flame.java:24)
Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
Data is merged by line number
2 samples (33%) 4 samples (66%)
Length of each bar is proportional to the number of samples (time)
Building a Flame Graph
JFR samples the stack traces of running threads every second
Flame.dummy(Flame.java:14)
24. 30 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method) Thread.sleep(Native Method)
Flame.f(Flame.java:21) Flame.g(Flame.java:24)
Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
2 samples (33%) 4 samples (66%)
IN REAL LIFE
Unsafe.park(Native Method)
......50 more lines↕......
SocketInputStream.socketRead0(Native Method)
......40 more lines↕......
423
210
After recording 1 minute of heavy parallel load:
.... 100 lines of Aspects, Spring, Tomcat ...
Data is merged by line number
Length of each bar is proportional to the number of samples (time)
Building a Flame Graph
JFR samples the stack traces of running threads every second
BLOCK
NETWORK
25. 31 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
ProfiledApp
Spring Boot 2.7
http://localhost:8080
Load Tests
Java app
(Gatling.io)
23 parallel requests
(closed world model)
PostgreSQL
in a Local Docker
localhost:5432
WireMock
in a Local Docker
hDp://localhost:9999
remote API responses
with 40ms delay
Glowroot.org
-javaagent:glowroot.jar
👉 http://localhost:4000
Experiment Environment
ToxiProxy
github.com/Shopify/toxiproxy
installed locally
localhost:55432
+3ms network latency
JFR
26. 32 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
EXPERIMENT
27. 33 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Glowroot.org
§Extremely simple application profiler
- Add VM arg: -javaagent:/path/to/glowroot.jar
- Open http://localhost:4000
§Monitors
- Methods Profiling, displayed as Flame Graphs
- SQLs
- API calls
- Typical bottlenecks
- Data per endpoint
§Free
- Probably not production-ready, more for learning/debugging
28. 35 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Split Points
ß 85% of the query time is spent acquiring a JDBC Connection à
(aka connection starvation issue)
Actual waiting
for SQL server
JDBC
Connection Pool
Starvation
29. 36 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
False Split
due to Just-In-Time Compiler (JIT) Optimization
The Same Method is called via 2 paths !!
Because JIT optimized a reflection call
(1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor
Fix: Warmup JVM: record in prod, or re-run load test
30. 37 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Time-consuming Aspects
By default, TransactionInterceptor (@Transactional)
acquires a connection before entering the annotated method
:382
:388
31. 38 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Lock Contention
Automatic analysis by Java Mission Control
of the recorded .jfr file:
https://wiki.openjdk.org/display/jmc/Releases
ß Waiting to enter a synchronized method à
(native C code is not recorded by JFR)
32. 39 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); * (N x remove)
* Expected to run super-fast O(size(list))
* In Production, with 10.000 items ...
2 SECONDS !!!!
WHY?!!
Gory Library QUIRKS
33. 40 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Flamegraph shows:
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); *
2 seconds are lost doing
ArrayList.contains !!
Gory Library QUIRKS
34. 41 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Gory Library QUIRKS
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); *
If the collection C passed as parameter
is larger then current HashSet,
iterate the elements of the HashSet
and remove them if C.contains(e) = O(N),
since in our case, C was a List
35. 43 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Best Feature of Java 21 LTS?
🎉 🎉 🎉 GA on 19 sep 2023 🎉 🎉 🎉
Virtual Threads
36. 44 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
JFR recording of
❤Virtual Threads❤
in a Spring Boot 3 app
running on Java 19/20
37. 45 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
JFR Profiling – Lessons Learned
✅ "Sees" inside unknown code / library
❌ Can't see into naAve C code
⚠ 1 sample/second è requires load tests or ran in prod
✅ Can be used in produc5on (x-low overhead)
⚠ Tuning requires paAence, expectaAons, and library awareness
⚠ Sampling Biases and JIT-influenced if profiling CPU-bound code
❌ Cannot follow async calls/reacAve chains (change threads)
✅ Can record Virtual Threads[JEP-444 in Java 21 LTS]
and thread forking done using Structured Concurrency[JEP-428 in Java19+]
38. 47 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Conclusions
§Zoom in on the performance issue via distributed tracing (Zipkin)
§Study default metrics collected by Micrometer & expose more
§If bo_leneck is sAll unclear, or you can't use above tools ⬆,
record prod or load-tests using JFR
- Load the .jfr recording in IntelliJ or in JMC for automa6c analysis
§OpAmize code & config, measuring the benefit vs. risk/pain
39. Profiling your Java Applica2on
victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro
Find the code at: https://github.com/victorrentea/performance-profiling.git
Join us!
victorrentea.ro/community