Profiling your Java Applica2on
@victorrentea ♦ VictorRentea.ro
Code: github.com/victorrentea/performance-profiling.git
👉 victorrentea.ro/training-offer
👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS)
Java Champion, 18 years of code, 10 years of teaching
Consultant & Trainer for 100+ companies:
❤ Clean Code, Architecture, Unit Tes3ng
🛠 Spring, Hibernate, Reac3ve/WebFlux
⚡ Java Performance, Secure Coding 🔐
Conference Talks on YouTube
Founder of European So<ware Cra<ers Community, > 6000 members
🔥 Free 1-hour webinars, a;er work, on clean code, design, tes>ng 👉 victorrentea.ro/community
Past events on youtube.com/vrentea
Father of 👧👦, servant of a 🐈, weekend gardener 🌼
@victorrentea
VictorRentea.ro
3 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
The response *me of your endpoint is too slow 😱
🎉 Congratulations! 🎉
You made it to production, and you're taking serious load!
Where to start?
4 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Log timestamps
log.info("Before call"); !// 2023-05-23 00:01:50.000 !!...
!// suspect code
log.info("After call"); !// 2023-05-23 00:01:53.100 !!...
Measuring Method Run*me – Old-School Style
t1-t0
long t0 = currentTimeMillis();
List<CommentDto> comments = client.fetchComments(loanId); // suspect code
long t1 = currentTimeMillis();
System.out.println("Took millis: " + (t1 - t0));
5 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Micrometer
@Timed !// wraps the method with an aspect that monitors spent time
List<CommentDto> fetchComments(Long id) {
!// suspect code
}
then, in a Spring application
GET /actuator/prometheus returns:
method_timed_seconds_sum{method="fetchComments",} 100.0
method_timed_seconds_count{method="fetchComments",} 400
è avg = 0.25 s/call
sampled again after 5 seconds:
method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s
method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls
è avg = 50 s / 100 = 0.5 s/call (spike)
time
average[ms]
250
500
6 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Micrometer API
https://www.baeldung.com/micrometer
!// programatic alternative to @Timed
meterRegistry.timer("name").record(() -> { !/*suspect code!*/ });
meterRegistry.timer("name").record(t1 - t0, MILLISECONDS);
!// accumulator
meterRegistry.counter("earnings").increment(purchase);
!// current value
meterRegistry.gauge("player-count", currentPlayers.size());
!// eg. "95% requests took ≤ 100 ms"
@Timed(percentiles = {0.95}, histogram = true)
7 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§Micrometer records metrics
a) your own
b) key framework metrics for common issues
c) opBonal metrics about Tomcat, executors, ...
§The metrics are regularly polled/pushed to other tools for storage
- eg. Prometheus
§... and later used by (eg.) Grafana to:
- Display impressive charts
- Aggregate metrics
- Raise alerts
- ...
Some examples è
Observing a System with Micrometer
8 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§ hBp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04
hBp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0
§ tomcat_connec>ons_current_connec>ons{...} 298 ⚠ 98 requests wai>ng in queue
tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starva>on?
§ jvm_gc_memory_allocated_bytes_total 1.8874368E9
jvm_memory_usage_a;er_gc_percent{...} 45.15 ⚠ >30% => increase max heap?
§ hikaricp_connec>ons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starva>on?
hikaricp_connec>ons_acquire_seconds_count{pool="HikariPool-1",} 551.0
§ cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% cache miss = bad config issue?
cache_gets_total{cache="cache1",result="hit",} 0.01
§ logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱
§ my_own_custom_metric xyz
Some Metrics
available at /actuator/prometheus
9 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
?
Monitoring a single system is easy!
10 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Compared to a Distributed Ecosystem
1st challenge: trace requests
A B C D E
Monitoring a single system is easy!
11 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§TraceID header is read from the incoming HTTP request/MQ message
- If missing, it is generated by first system in the call chain
§TraceID is propagated inside the app:
- Methods called (via ThreadLocal)
- Any logs printed (via Logback MDC) – cri>cal for distributed log analysis
- Upstream operators in a Reactor chain (via ReactorContext in WebFlux)
- Async methods submiBed to a DI-managed *TaskExecutor (via a TaskDecorator)
§TraceID header is sent in any outgoing request/message
- ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ...
Distributed Request Tracing
h:ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on
A B
<header> TraceId
12 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§The TraceId is available for any processing we do
§Every system reports the span of :me spent working on a TraceId
- For performance, a frac>on of spans are reported (eg 1%)
§... to a central system that builds a :me chart of all systems
example è
Distributed Request Sampling
13 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Distributed Request Sampling with Zipkin
14 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Distributed Request Sampling with Zipkin
15 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
§Repeated API calls, by accident 😱
§API calls in a loop, should be fetched in batches
- GET /item/1, GET /item/3, ....
§Long chains of REST calls
- A à B à C à D à E
§System-in-the-middle
§Independent API calls than can be parallelized
§Long spans not waiAng for other systems => zoom in 🔎
Distributed Processing An*-PaKerns ⚠
16 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Profiling
Blackbox: inter-system communica/on
(Zipkin, ...)
Graybox: report key framework metrics
(Micrometer)
Whitebox: profile methods inside JVM
(Java Flight Recorderè)
app
lib-a lib-b
methodA() methodB()
17 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
How to find methods that take a long 1me?
💡
Naive idea: Instrument your code with a -javaagent
that measures the execuPon of every single executed Java method
Adds a huge CPU+MEMORY overhead
18 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Java Flight Recorder
The Tools of the Java Gods
19 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Java Flight Recorder (JFR)
Main features:
§ Capture stack traces of running threads every second
§ Record internal JVM events about locks, files, sockets, GC ac>vity, ... or your own
§ Usable in prod: op>mized to add x-low overhead (<2%)
§ Free since Java 11
Usage:
§ Local JVM using Java Mission Control or IntelliJ ⬆
§ Remote JVM => generate a .jfr file => download => analyze
First, enable JFR via -XX:+FlightRecorder, then start the recording:
- Con3nuous from startup: -XX:StartFlightRecording=....
- Via command line (SSH): jcmd <JAVAPID> JFR.start
- Via endpoints JMX or Actuator
- Via tools, eg Glowroot in our experiment
26 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Flame Graph
best way to visualize the output of an execuAon profiler
27 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method)
Flame.f(Flame.java:21)
Flame.entry(Flame.java:17)
Flame.dummy(Flame.java:14)
Thread.sleep(Native Method)
Flame.g(Flame.java:24)
Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
2 samples (33%)
4 samples (66%)
JFR samples the stack traces of running threads every second
28 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method)
Flame.g(Flame.java:24)
Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
Thread.sleep(Native Method)
Flame.f(Flame.java:21)
Flame.entry(Flame.java:17)
Flame.dummy(Flame.java:14)
2 samples (33%) 4 samples (66%)
Length of each bar is propor?onal to the number of samples (?me)
Building a Flame Graph
JFR samples the stack traces of running threads every second
Flame.dummy(Flame.java:14)
29 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Thread.sleep(Native Method) Thread.sleep(Native Method)
Flame.f(Flame.java:21) Flame.g(Flame.java:24)
Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
Data is merged by line number
2 samples (33%) 4 samples (66%)
Length of each bar is proportional to the number of samples (time)
Building a Flame Graph
JFR samples the stack traces of running threads every second
Flame.dummy(Flame.java:14)
30 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Thread.sleep(Native Method) Thread.sleep(Native Method)
Flame.f(Flame.java:21) Flame.g(Flame.java:24)
Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
Flame.dummy(Flame.java:14)
2 samples (33%) 4 samples (66%)
IN REAL LIFE
Unsafe.park(Native Method)
......50 more lines↕......
SocketInputStream.socketRead0(Native Method)
......40 more lines↕......
423
210
After recording 1 minute of heavy parallel load:
.... 100 lines of Aspects, Spring, Tomcat ...
Data is merged by line number
Length of each bar is proportional to the number of samples (time)
Building a Flame Graph
JFR samples the stack traces of running threads every second
BLOCK
NETWORK
31 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
ProfiledApp
Spring Boot 2.7
http://localhost:8080
Load Tests
Java app
(Gatling.io)
23 parallel requests
(closed world model)
PostgreSQL
in a Local Docker
localhost:5432
WireMock
in a Local Docker
hDp://localhost:9999
remote API responses
with 40ms delay
Glowroot.org
-javaagent:glowroot.jar
👉 http://localhost:4000
Experiment Environment
ToxiProxy
github.com/Shopify/toxiproxy
installed locally
localhost:55432
+3ms network latency
JFR
32 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
EXPERIMENT
33 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Glowroot.org
§Extremely simple application profiler
- Add VM arg: -javaagent:/path/to/glowroot.jar
- Open http://localhost:4000
§Monitors
- Methods Profiling, displayed as Flame Graphs
- SQLs
- API calls
- Typical bottlenecks
- Data per endpoint
§Free
- Probably not production-ready, more for learning/debugging
35 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Split Points
ß 85% of the query time is spent acquiring a JDBC Connection à
(aka connection starvation issue)
Actual waiting
for SQL server
JDBC
Connection Pool
Starvation
36 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
False Split
due to Just-In-Time Compiler (JIT) Optimization
The Same Method is called via 2 paths !!
Because JIT optimized a reflection call
(1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor
Fix: Warmup JVM: record in prod, or re-run load test
37 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Time-consuming Aspects
By default, TransactionInterceptor (@Transactional)
acquires a connection before entering the annotated method
:382
:388
38 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Lock Contention
Automatic analysis by Java Mission Control
of the recorded .jfr file:
https://wiki.openjdk.org/display/jmc/Releases
ß Waiting to enter a synchronized method à
(native C code is not recorded by JFR)
39 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); * (N x remove)
* Expected to run super-fast O(size(list))
* In Production, with 10.000 items ...
2 SECONDS !!!!
WHY?!!
Gory Library QUIRKS
40 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Flamegraph shows:
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); *
2 seconds are lost doing
ArrayList.contains !!
Gory Library QUIRKS
41 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Gory Library QUIRKS
HashSet<Long> largeSet = ...; ß ~10K items
largeSet.remove(13); ß constant time O(1)
List<Long> largerList = ...; ß ~10K items
largeSet.removeAll(largerList); *
If the collection C passed as parameter
is larger then current HashSet,
iterate the elements of the HashSet
and remove them if C.contains(e) = O(N),
since in our case, C was a List
43 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
Best Feature of Java 21 LTS?
🎉 🎉 🎉 GA on 19 sep 2023 🎉 🎉 🎉
Virtual Threads
44 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
JFR recording of
❤Virtual Threads❤
in a Spring Boot 3 app
running on Java 19/20
45 VictorRentea.ro
a training by
Join the European Software Crafters community at victorrentea.ro/community
JFR Profiling – Lessons Learned
✅ "Sees" inside unknown code / library
❌ Can't see into naAve C code
⚠ 1 sample/second è requires load tests or ran in prod
✅ Can be used in produc5on (x-low overhead)
⚠ Tuning requires paAence, expectaAons, and library awareness
⚠ Sampling Biases and JIT-influenced if profiling CPU-bound code
❌ Cannot follow async calls/reacAve chains (change threads)
✅ Can record Virtual Threads[JEP-444 in Java 21 LTS]
and thread forking done using Structured Concurrency[JEP-428 in Java19+]
47 VictorRentea.ro
a training by
Join the European So3ware Cra3ers community at victorrentea.ro/community
Conclusions
§Zoom in on the performance issue via distributed tracing (Zipkin)
§Study default metrics collected by Micrometer & expose more
§If bo_leneck is sAll unclear, or you can't use above tools ⬆,
record prod or load-tests using JFR
- Load the .jfr recording in IntelliJ or in JMC for automa6c analysis
§OpAmize code & config, measuring the benefit vs. risk/pain
Profiling your Java Applica2on
victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro
Find the code at: https://github.com/victorrentea/performance-profiling.git
Join us!
victorrentea.ro/community

Profiling your Java Application

  • 1.
    Profiling your JavaApplica2on @victorrentea ♦ VictorRentea.ro Code: github.com/victorrentea/performance-profiling.git
  • 2.
    👉 victorrentea.ro/training-offer 👋 Hi,I'm Victor Rentea 🇷🇴 PhD(CS) Java Champion, 18 years of code, 10 years of teaching Consultant & Trainer for 100+ companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Spring, Hibernate, Reac3ve/WebFlux ⚡ Java Performance, Secure Coding 🔐 Conference Talks on YouTube Founder of European So<ware Cra<ers Community, > 6000 members 🔥 Free 1-hour webinars, a;er work, on clean code, design, tes>ng 👉 victorrentea.ro/community Past events on youtube.com/vrentea Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea VictorRentea.ro
  • 3.
    3 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community The response *me of your endpoint is too slow 😱 🎉 Congratulations! 🎉 You made it to production, and you're taking serious load! Where to start?
  • 4.
    4 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Log timestamps log.info("Before call"); !// 2023-05-23 00:01:50.000 !!... !// suspect code log.info("After call"); !// 2023-05-23 00:01:53.100 !!... Measuring Method Run*me – Old-School Style t1-t0 long t0 = currentTimeMillis(); List<CommentDto> comments = client.fetchComments(loanId); // suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));
  • 5.
    5 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer @Timed !// wraps the method with an aspect that monitors spent time List<CommentDto> fetchComments(Long id) { !// suspect code } then, in a Spring application GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400 è avg = 0.25 s/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls è avg = 50 s / 100 = 0.5 s/call (spike) time average[ms] 250 500
  • 6.
    6 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer API https://www.baeldung.com/micrometer !// programatic alternative to @Timed meterRegistry.timer("name").record(() -> { !/*suspect code!*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); !// accumulator meterRegistry.counter("earnings").increment(purchase); !// current value meterRegistry.gauge("player-count", currentPlayers.size()); !// eg. "95% requests took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)
  • 7.
    7 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §Micrometer records metrics a) your own b) key framework metrics for common issues c) opBonal metrics about Tomcat, executors, ... §The metrics are regularly polled/pushed to other tools for storage - eg. Prometheus §... and later used by (eg.) Grafana to: - Display impressive charts - Aggregate metrics - Raise alerts - ... Some examples è Observing a System with Micrometer
  • 8.
    8 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community § hBp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 hBp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connec>ons_current_connec>ons{...} 298 ⚠ 98 requests wai>ng in queue tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starva>on? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_a;er_gc_percent{...} 45.15 ⚠ >30% => increase max heap? § hikaricp_connec>ons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starva>on? hikaricp_connec>ons_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% cache miss = bad config issue? cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱 § my_own_custom_metric xyz Some Metrics available at /actuator/prometheus
  • 9.
    9 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community ? Monitoring a single system is easy!
  • 10.
    10 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Compared to a Distributed Ecosystem 1st challenge: trace requests A B C D E Monitoring a single system is easy!
  • 11.
    11 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §TraceID header is read from the incoming HTTP request/MQ message - If missing, it is generated by first system in the call chain §TraceID is propagated inside the app: - Methods called (via ThreadLocal) - Any logs printed (via Logback MDC) – cri>cal for distributed log analysis - Upstream operators in a Reactor chain (via ReactorContext in WebFlux) - Async methods submiBed to a DI-managed *TaskExecutor (via a TaskDecorator) §TraceID header is sent in any outgoing request/message - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ... Distributed Request Tracing h:ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on A B <header> TraceId
  • 12.
    12 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the span of :me spent working on a TraceId - For performance, a frac>on of spans are reported (eg 1%) §... to a central system that builds a :me chart of all systems example è Distributed Request Sampling
  • 13.
    13 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
  • 14.
    14 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
  • 15.
    15 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §Repeated API calls, by accident 😱 §API calls in a loop, should be fetched in batches - GET /item/1, GET /item/3, .... §Long chains of REST calls - A à B à C à D à E §System-in-the-middle §Independent API calls than can be parallelized §Long spans not waiAng for other systems => zoom in 🔎 Distributed Processing An*-PaKerns ⚠
  • 16.
    16 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Profiling Blackbox: inter-system communica/on (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè) app lib-a lib-b methodA() methodB()
  • 17.
    17 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community How to find methods that take a long 1me? 💡 Naive idea: Instrument your code with a -javaagent that measures the execuPon of every single executed Java method Adds a huge CPU+MEMORY overhead
  • 18.
    18 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder The Tools of the Java Gods
  • 19.
    19 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Java Flight Recorder (JFR) Main features: § Capture stack traces of running threads every second § Record internal JVM events about locks, files, sockets, GC ac>vity, ... or your own § Usable in prod: op>mized to add x-low overhead (<2%) § Free since Java 11 Usage: § Local JVM using Java Mission Control or IntelliJ ⬆ § Remote JVM => generate a .jfr file => download => analyze First, enable JFR via -XX:+FlightRecorder, then start the recording: - Con3nuous from startup: -XX:StartFlightRecording=.... - Via command line (SSH): jcmd <JAVAPID> JFR.start - Via endpoints JMX or Actuator - Via tools, eg Glowroot in our experiment
  • 20.
    26 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Flame Graph best way to visualize the output of an execuAon profiler
  • 21.
    27 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR samples the stack traces of running threads every second
  • 22.
    28 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of each bar is propor?onal to the number of samples (?me) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  • 23.
    29 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Data is merged by line number 2 samples (33%) 4 samples (66%) Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  • 24.
    30 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) IN REAL LIFE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ... Data is merged by line number Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second BLOCK NETWORK
  • 25.
    31 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world model) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker hDp://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy github.com/Shopify/toxiproxy installed locally localhost:55432 +3ms network latency JFR
  • 26.
    32 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community EXPERIMENT
  • 27.
    33 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Glowroot.org §Extremely simple application profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open http://localhost:4000 §Monitors - Methods Profiling, displayed as Flame Graphs - SQLs - API calls - Typical bottlenecks - Data per endpoint §Free - Probably not production-ready, more for learning/debugging
  • 28.
    35 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiting for SQL server JDBC Connection Pool Starvation
  • 29.
    36 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community False Split due to Just-In-Time Compiler (JIT) Optimization The Same Method is called via 2 paths !! Because JIT optimized a reflection call (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor Fix: Warmup JVM: record in prod, or re-run load test
  • 30.
    37 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Time-consuming Aspects By default, TransactionInterceptor (@Transactional) acquires a connection before entering the annotated method :382 :388
  • 31.
    38 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Lock Contention Automatic analysis by Java Mission Control of the recorded .jfr file: https://wiki.openjdk.org/display/jmc/Releases ß Waiting to enter a synchronized method à (native C code is not recorded by JFR)
  • 32.
    39 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * (N x remove) * Expected to run super-fast O(size(list)) * In Production, with 10.000 items ... 2 SECONDS !!!! WHY?!! Gory Library QUIRKS
  • 33.
    40 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Flamegraph shows: HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * 2 seconds are lost doing ArrayList.contains !! Gory Library QUIRKS
  • 34.
    41 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Gory Library QUIRKS HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * If the collection C passed as parameter is larger then current HashSet, iterate the elements of the HashSet and remove them if C.contains(e) = O(N), since in our case, C was a List
  • 35.
    43 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Best Feature of Java 21 LTS? 🎉 🎉 🎉 GA on 19 sep 2023 🎉 🎉 🎉 Virtual Threads
  • 36.
    44 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20
  • 37.
    45 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community JFR Profiling – Lessons Learned ✅ "Sees" inside unknown code / library ❌ Can't see into naAve C code ⚠ 1 sample/second è requires load tests or ran in prod ✅ Can be used in produc5on (x-low overhead) ⚠ Tuning requires paAence, expectaAons, and library awareness ⚠ Sampling Biases and JIT-influenced if profiling CPU-bound code ❌ Cannot follow async calls/reacAve chains (change threads) ✅ Can record Virtual Threads[JEP-444 in Java 21 LTS] and thread forking done using Structured Concurrency[JEP-428 in Java19+]
  • 38.
    47 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Conclusions §Zoom in on the performance issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If bo_leneck is sAll unclear, or you can't use above tools ⬆, record prod or load-tests using JFR - Load the .jfr recording in IntelliJ or in JMC for automa6c analysis §OpAmize code & config, measuring the benefit vs. risk/pain
  • 39.
    Profiling your JavaApplica2on victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community