SlideShare a Scribd company logo
J AVA P R O F I L I N G
I N T R O D U C T I O N T O
Jerry Yoakum
Expedia Affiliate Network
A G E N D A
• When to profile
• Profiler Sampling
• Profiler Instrumentation
• Where to Start
• Examples
• Micro vs Macro Benchmarking
W H E N T O P R O F I L E
• When a performance issue is unclear.
• To proactively check that an application is performing as expected.
• To turbo-charge an application?
“We should forget about small efficiencies,
say about 97% of the time; premature
optimization is the root of all evil.”
– D O N A L D K N U T H
The point that Knuth is trying to make is that in the end, you should write “clean, straightforward code that is simple to read and understand. In this context, “optimizing”
is understood to mean employing algorithmic and design changes that complicate program structure but provide better performance. Those kind of optimizations indeed
are best left undone until such time as the profiling of a program shows that there is a large benefit from performing them.
if (LOG.isTraceEnabled()) {
LOG.trace(String.format("X: %s and Y: %s",

calcX(), calcY()));

}
B E S T P R A C T I C E S A R E N O T
P R E M AT U R E O P T I M I Z AT I O N S
P R E M AT U R E O P T I M I Z AT I O N S I N C L U D E …
• Manually inlining methods.
• Writing directly in bytecode.
• Allocating public variables and using them as global memory

through out an application.
• And anything else that makes the code unduly difficult to

work with.
T O O L S !
• vmstat
• iostat
“Performance analysis is all about visibility—knowing what is going on inside of an application, and in the application’s environment. Visibility is all about tools. And so
performance tuning is all about tools.”
O V E R L O A D E D
M A C H I N E
• $ vmstat 1
• ‘r’ column is the run queue length
• the number of all threads that are
running or that could run if there were
an available CPU
• if the run queue length is too high for
any significant period of time, it is an
indication that the machine is
overloaded
V M S TAT E X A M P L E F O R A L O W U S A G E S Y S T E M
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 867632 38568 165348 0 0 453 20 236 271 3 5 91 1 0
0 0 0 867632 38568 165348 0 0 0 0 161 247 0 1 99 0 0
0 0 0 867632 38568 165348 0 0 0 0 140 240 0 1 99 0 0
0 0 0 867632 38568 165348 0 0 0 0 152 255 0 1 99 0 0
1 0 0 867632 38568 165348 0 0 0 0 147 240 0 1 99 0 0
V M S TAT E X A M P L E F O R A B U S Y S Y S T E M
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
12 0 82596 130020 130816 524228 0 0 0 0 2696 4644 84 12 4 0 0
12 0 83288 149288 129784 517476 32 692 32 692 3722 4536 85 14 1 0 0
14 0 83288 130248 129784 522520 0 0 0 0 2644 5128 87 13 0 0 0
0 2 83288 142548 129788 521936 64 0 64 40 1653 2748 53 8 20 20 0
13 0 86720 127480 125384 519344 32 3436 32 3436 4421 4671 76 12 6 5 0
17 1 87336 141932 124548 515632 64 616 64 632 3110 4302 87 13 1 0 0
Examine Disk IO with iostat -xm 5
for a non-busy system
avg-cpu: %user %nice %system %iowait %steal %idle
22.84 0.00 1.00 0.01 0.00 76.14
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 0.01 15.67 0.04 4.42 0.00 0.08 36.28 0.01 2.27 0.22 0.10
dm-0 0.00 0.00 0.77 0.56 0.00 0.00 8.00 0.01 4.89 0.36 0.05
dm-1 0.00 0.00 0.05 20.09 0.00 0.08 8.03 0.12 5.73 0.05 0.10
Examine Disk IO with iostat -xm 5
for a busy system
avg-cpu: %user %nice %system %iowait %steal %idle
86.20 0.00 13.50 0.00 0.10 0.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80
dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 15.53 4.00 1.36
dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44
Is %idle low?
Examine Disk IO with iostat -xm 5
for a busy system
avg-cpu: %user %nice %system %iowait %steal %idle
16.20 0.00 83.50 0.00 0.10 0.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80
dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 15.53 4.00 1.36
dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44
Is %system higher than %user?
Examine Disk IO with iostat -xm 5
for a busy system
avg-cpu: %user %nice %system %iowait %steal %idle
16.20 0.00 83.50 0.00 0.10 0.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80
dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 35.53 4.00 81.36
dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44
Is a device being used more than others?
Examine Disk IO with iostat -xm 5
for a busy system
avg-cpu: %user %nice %system %iowait %steal %idle
16.20 0.00 83.50 0.00 0.10 0.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 30.00 2.40 8.20 1.0 0.15 0.01 36.00 0.05 5.78 3.04 2.80
dm-0 0.00 0.00 0.20 63.2 0.00 0.01 8.00 0.05 35.53 4.00 81.36
dm-1 0.00 0.00 38.00 0.0 0.15 0.00 8.00 0.17 4.49 0.38 1.44
Are the w/s high while the wMB/s is low?
Examine Disk IO with iostat -xm 5
for a busy system
avg-cpu: %user %nice %system %iowait %steal %idle
16.20 0.00 83.50 0.00 0.10 0.20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vda 30.00 2.40 8.20 1.0 0.15 0.01 36.00 0.05 5.78 3.04 2.80
dm-0 0.00 0.00 0.20 63.2 0.00 0.01 8.00 0.05 35.53 4.00 81.36
dm-1 0.00 0.00 38.00 0.0 0.15 0.00 8.00 0.17 4.49 0.38 1.44
Is await high for a device?
P R O F I L E R S A M P L I N G
• Sampling-based profilers are the most common kind of profiler.
• Because of their relatively low profile, sampling profilers introduce fewer
measurement artifacts.
• Different sampling profiles behave differently; each may be better for a
particular application.
Sampling profilers probe the program counter at regular intervals using operating system interrupts. Sampling profilers are less accurate but facilitate a near normal
execution time.
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
main()
prog()
s()
con()
S A M P L I N G
S A F E P O I N T S
Sampling profilers in Java can only take the sample of
a thread when the thread is at a safepoint—essentially,
whenever it is allocating memory.
P R O F I L E R I N S T R U M E N TAT I O N
• Instrumented profilers yield more information about an application, but
can possibly have a greater effect on the application than a sampling
profiler.
• Instrumented profilers should be set up to instrument small sections of the
code—a few classes or packages. That limits their impact on the
application’s performance.
Instrumented profiler adds additional instructions in the code to gather data about what was executed, when, for how long, etc.
I N S T R U M E N TAT I O N I M PA C T
Instrumented code may change the execution profile.
For example, the JVM will inline small methods so that no method invocation is needed when the small-method code is executed. The compiler makes that decision
based on the size of the code; depending on how the code is instrumented, it may no longer be eligible to be inlined. This may cause the instrumented profiler to
overestimate the contribution of certain methods. And inlining is just one example of a decision that the compiler makes based on the layout of the code; in general, the
more the code is instrumented (changed), the more likely it is that its execution profile will change.
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
I N S T R U M E N T E D
main()
prog()
s()
con()
The thing to notice is that there is so much instrumentation that it is potentially greater than the con() but since it is added to con() that method appears to have greater
impact.
P R O F I L E T H E C P U F I R S T
• CPU time is the first thing to examine when looking at performance of an
application.
• The goal in optimizing code is to drive the CPU usage up (for a shorter
period of time), not down.
• Understand why CPU usage is low before diving in and attempting to tune
an application.
P R O F I L E T H E C P U F I R S T
In the heat of battle, in can be tough to choose your targets. I’m sympathetic to that. You see lots of garbage collections with a big heap, you want to profile the memory
right away! But I’m asking you… no, I’m begging you. For the love of Java. People. Profile the CPU. The CPU. This CPU right here! Profile the CPU first!
L I M I T WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void waste() {
21 for (Long count = 0L;
count < 500_000_000;
count++) {
22 value += count;
23 }
24 }
S TA R T L I M I T WA S T E W I T H A G E N T AT TA C H E D
$ java -agentpath:libyjpagent.jnilib LimitWaste
[YourKit Java Profiler 2015 build 15042]
Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log
Press enter to continue.
Y O U R K I T J AVA P R O F I L E R
Y O U R K I T - C H O O S E A P P L I C AT I O N
Y O U R K I T - S TA R T S W I T H S TA C K T E L E M E T RY
Y O U R K I T - S TA R T S A M P L I N G
C O N T I N U E P R O C E S S I N G O F L I M I T WA S T E
$ java -agentpath:libyjpagent.jnilib LimitWaste
[YourKit Java Profiler 2015 build 15042]
Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log
Press enter to continue.
124999999750000000 after 7827.359 ms
Press enter to finish.
Y O U R K I T - S T O P S A M P L I N G
Y O U R K I T - A N A LY Z E C A L L T R E E
L I M I T WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void waste() {
21 for (Long count = 0L;
count < 500_000_000;
count++) {
22 value += count;
23 }
24 }
L I M I T A L L O C AT I O N WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void waste() {
21 for (Long count = 0L;
count < 500_000_000;
Long.valueOf(count + 1)) {
22 value = Long.valueOf(value + count);
23 }
24 }
Y O U R K I T - P E R F C H A R T F O R G C
Y O U R K I T - P E R F C H A R T F O R A L L O C AT I O N
L I M I T A L L O C AT I O N WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void waste() {
21 for (Long count = 0L;
count < 500_000_000;
Long.valueOf(count + 1)) {
22 value = Long.valueOf(value + count);
23 }
24 }
L I M I T A L L O C AT I O N WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void lessWaste() {
21 for (long count = 0;
count < 500_000_000;
count++) {
22 value = Long.valueOf(value + count);
23 }
24 }
L I M I T WA S T E I M P R O V E D
$ java -agentpath:libyjpagent.jnilib LimitWaste
[YourKit Java Profiler 2015 build 15042]
Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log
Press enter to continue.
124999999750000000 after 14833.461 ms
Press enter to continue.
124999999750000000 after 8551.391 ms
Press enter to finish.
Y O U R K I T - L I M I T WA S T E I M P R O V E D
L I M I T A L L O C AT I O N WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void lessWaste() {
21 for (long count = 0;
count < 500_000_000;
count++) {
22 value = Long.valueOf(value + count);
23 }
24 }
L I M I T A L L O C AT I O N WA S T E E X A M P L E
static volatile Long value = 0L;
…
20 private static void haste() {
21 long fastValue = 0L;
22 for (long count = 0;
count < 500_000_000;
count++) {
23 fastValue += count;
24 }
25 value = fastValue;
26 }
L I M I T WA S T E - M A K E H A S T E
$ java -agentpath:libyjpagent.jnilib LimitWaste
[YourKit Java Profiler 2015 build 15042]
Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log
Press enter to continue.
124999999750000000 after 14833.461 ms
Press enter to continue.
124999999750000000 after 8551.391 ms
Press enter to continue.
124999999750000000 after 266.119 ms
Press enter to finish.
Y O U R K I T - L I M I T WA S T E - M A K E H A S T E
Y O U R K I T - L I M I T WA S T E - M A K E H A S T E
T H R E A D P R O F I L I N G
• Thread profiling is concerned with examining the different thread states.
• If threads are blocked most of the time then execution power is reduced.
T H R E A D P R O F I L I N G E X A M P L E
ExecutorService execSvc = Executors.newFixedThreadPool(200);
for (int i = 0; i < 1000; i++) {
execSvc.execute(new SortingThread());
}
execSvc.shutdown();
execSvc.awaitTermination(5, TimeUnit.MINUTES);
T H R E A D P R O F I L I N G E X A M P L E
class SortingThread implements Runnable {
@Override
public void run() {
System.out.println("starting...");
int arraySize = 300_000;
int[] bigArray = new int[arraySize];
// populate the array with random numbers
for (int i = 0; i < arraySize; i++) {
bigArray[i] = ThreadLocalRandom.current().nextInt(50_000);
}
Arrays.sort(bigArray);
System.out.println("finished!");
}
}
T H R E A D P R O F I L I N G E X A M P L E
$ java -agentpath:libyjpagent.jnilib ThreadExample
[YourKit Java Profiler 2015 build 15042]
Log file: /Users/jyoakum/.yjp/log/ThreadExample-90362.log
Press enter to continue.
starting…
…
finished!
Complete after 9041.103 ms
Press enter to finish.
T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T
The key thing to take notice of here is that the percent of time under run() only adds up to 56%. Leaving 43% as unaccounted…
T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T
T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T
T H R E A D P R O F I L I N G E X A M P L E - J M C
• JMC (Java Mission Control)
• Low overhead - built into the JVM
• Commercial feature that requires license agreements for production use
T H R E A D P R O F I L I N G E X A M P L E - J M C
$ java -XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
ThreadExample
Press enter to continue.
starting…
…
finished!
Complete after 4965.916 ms
Press enter to finish.
T H R E A D P R O F I L I N G E X A M P L E - J M C
T H R E A D P R O F I L I N G E X A M P L E - J M C
T H R E A D P R O F I L I N G E X A M P L E - J M C
T H R E A D P R O F I L I N G E X A M P L E - J M C
T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L
• Originally used a pool size of 200 threads.
• Using a pool size of 40 threads results in nearly the same run time and
some other benefits.
T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L
Before we had multiple threads blocked. Now we have are waiting to create threads.
T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L
Before we used nearly 256 MB of heap. Now we used just over 128 MB of heap.
M I C R O B E N C H M A R K S
public void doTest() {
double d;
long then = System.currentTimeMillis();
for (int i = 0; i < nLoops; i++) {
d = fib(15);
}
long now = System.currentTimeMillis();
System.out.println(

"Elapsed time: " + (now - then));
}
private double fib(int n) {
if (n < 0) {

throw new IllegalArgumentException(

"Must be > 0");

}
if (n == 0) { return 0.0d; }
if (n == 1) { return 1.0d; }
double d = fib(n - 2) + fib(n - 1);
if (Double.isInfinite(d)) {

throw new ArithmeticException("Overflow");

}
return d;
}
M I C R O B E N C H M A R K S M U S T U S E T H E I R R E S U LT S
A smart compiler will end up executing this code:
long then = System.currentTimeMillis();
long now = System.currentTimeMillis();
System.out.println("Elapsed time: " + (now - then));
Avoid compiler optimizations:
• Read each result.
• Use volatile instance variables.
There is a way around that particular issue: ensure that each result is read, not simply written. In practice, changing the definition of i from a local variable to an instance
variable (declared with the volatile keyword) will allow the performance of the method to be measured.
WA R M - U P P E R I O D
For microbenchmarks, a warm-up period is
required; otherwise, the microbenchmark
is measuring the performance of
compilation rather than the code it is
attempting to measure.
M A C R O B E N C H M A R K S
No test can give comparable results
to examining an application in production.
The best thing to use to measure performance of an application “is the application itself, in conjunction with any external resources it uses. If the application normally
checks the credentials of a user by making LDAP calls, it should be tested in that mode. Stubbing out the LDAP calls may make sense for module-level testing, but the
application must be tested in its full configuration.
S U M M A RY
• When to profile
• Profiler Sampling
• Profiler Instrumentation
• Where to Start
• Examples
• Micro vs Macro Benchmarking
Yes, it is the same slide as the agenda slide.
Questions?

More Related Content

What's hot

DiUS Computing Lca Rails Final
DiUS  Computing Lca Rails FinalDiUS  Computing Lca Rails Final
DiUS Computing Lca Rails Final
Robert Postill
 
Computers or something
Computers or somethingComputers or something
Computers or something
dattmamon
 
Where'd all my memory go? SCALE 12x SCALE12x
Where'd all my memory go? SCALE 12x SCALE12xWhere'd all my memory go? SCALE 12x SCALE12x
Where'd all my memory go? SCALE 12x SCALE12x
Joshua Miller
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Anne Nicolas
 
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
Simen Li
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論
belltailjp
 
Vhdl practical exam guide
Vhdl practical exam guideVhdl practical exam guide
Vhdl practical exam guideEslam Mohammed
 
Key recovery attacks against commercial white-box cryptography implementation...
Key recovery attacks against commercial white-box cryptography implementation...Key recovery attacks against commercial white-box cryptography implementation...
Key recovery attacks against commercial white-box cryptography implementation...
CODE BLUE
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Anne Nicolas
 
Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...
Dev_Events
 

What's hot (10)

DiUS Computing Lca Rails Final
DiUS  Computing Lca Rails FinalDiUS  Computing Lca Rails Final
DiUS Computing Lca Rails Final
 
Computers or something
Computers or somethingComputers or something
Computers or something
 
Where'd all my memory go? SCALE 12x SCALE12x
Where'd all my memory go? SCALE 12x SCALE12xWhere'd all my memory go? SCALE 12x SCALE12x
Where'd all my memory go? SCALE 12x SCALE12x
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
 
chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論chainer-trt: ChainerとTensorRTで超高速推論
chainer-trt: ChainerとTensorRTで超高速推論
 
Vhdl practical exam guide
Vhdl practical exam guideVhdl practical exam guide
Vhdl practical exam guide
 
Key recovery attacks against commercial white-box cryptography implementation...
Key recovery attacks against commercial white-box cryptography implementation...Key recovery attacks against commercial white-box cryptography implementation...
Key recovery attacks against commercial white-box cryptography implementation...
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
 
Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...
 

Similar to Introduction to Java Profiling

Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Orgad Kimchi
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
Brendan Gregg
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
Brendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
NETWAYS
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
Brendan Gregg
 
sun solaris
sun solarissun solaris
sun solaris
Subur Haryawan
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
20150918 klug el performance tuning-v1.4
20150918 klug el performance tuning-v1.420150918 klug el performance tuning-v1.4
20150918 klug el performance tuning-v1.4
Jinkoo Han
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
Brendan Gregg
 
test
testtest
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
Brendan Gregg
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
Miguel Rodriguez
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
NETWAYS
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
Georg Schönberger
 
Essential Linux Commands for DBAs
Essential Linux Commands for DBAsEssential Linux Commands for DBAs
Essential Linux Commands for DBAs
Gokhan Atil
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg
 

Similar to Introduction to Java Profiling (20)

Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
sun solaris
sun solarissun solaris
sun solaris
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
20150918 klug el performance tuning-v1.4
20150918 klug el performance tuning-v1.420150918 klug el performance tuning-v1.4
20150918 klug el performance tuning-v1.4
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
test
testtest
test
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Essential Linux Commands for DBAs
Essential Linux Commands for DBAsEssential Linux Commands for DBAs
Essential Linux Commands for DBAs
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Introduction to Java Profiling

  • 1. J AVA P R O F I L I N G I N T R O D U C T I O N T O Jerry Yoakum Expedia Affiliate Network
  • 2. A G E N D A • When to profile • Profiler Sampling • Profiler Instrumentation • Where to Start • Examples • Micro vs Macro Benchmarking
  • 3. W H E N T O P R O F I L E • When a performance issue is unclear. • To proactively check that an application is performing as expected. • To turbo-charge an application?
  • 4. “We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil.” – D O N A L D K N U T H The point that Knuth is trying to make is that in the end, you should write “clean, straightforward code that is simple to read and understand. In this context, “optimizing” is understood to mean employing algorithmic and design changes that complicate program structure but provide better performance. Those kind of optimizations indeed are best left undone until such time as the profiling of a program shows that there is a large benefit from performing them.
  • 5. if (LOG.isTraceEnabled()) { LOG.trace(String.format("X: %s and Y: %s",
 calcX(), calcY()));
 } B E S T P R A C T I C E S A R E N O T P R E M AT U R E O P T I M I Z AT I O N S
  • 6. P R E M AT U R E O P T I M I Z AT I O N S I N C L U D E … • Manually inlining methods. • Writing directly in bytecode. • Allocating public variables and using them as global memory
 through out an application. • And anything else that makes the code unduly difficult to
 work with.
  • 7. T O O L S ! • vmstat • iostat “Performance analysis is all about visibility—knowing what is going on inside of an application, and in the application’s environment. Visibility is all about tools. And so performance tuning is all about tools.”
  • 8. O V E R L O A D E D M A C H I N E • $ vmstat 1 • ‘r’ column is the run queue length • the number of all threads that are running or that could run if there were an available CPU • if the run queue length is too high for any significant period of time, it is an indication that the machine is overloaded
  • 9. V M S TAT E X A M P L E F O R A L O W U S A G E S Y S T E M $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 867632 38568 165348 0 0 453 20 236 271 3 5 91 1 0 0 0 0 867632 38568 165348 0 0 0 0 161 247 0 1 99 0 0 0 0 0 867632 38568 165348 0 0 0 0 140 240 0 1 99 0 0 0 0 0 867632 38568 165348 0 0 0 0 152 255 0 1 99 0 0 1 0 0 867632 38568 165348 0 0 0 0 147 240 0 1 99 0 0
  • 10. V M S TAT E X A M P L E F O R A B U S Y S Y S T E M $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 12 0 82596 130020 130816 524228 0 0 0 0 2696 4644 84 12 4 0 0 12 0 83288 149288 129784 517476 32 692 32 692 3722 4536 85 14 1 0 0 14 0 83288 130248 129784 522520 0 0 0 0 2644 5128 87 13 0 0 0 0 2 83288 142548 129788 521936 64 0 64 40 1653 2748 53 8 20 20 0 13 0 86720 127480 125384 519344 32 3436 32 3436 4421 4671 76 12 6 5 0 17 1 87336 141932 124548 515632 64 616 64 632 3110 4302 87 13 1 0 0
  • 11. Examine Disk IO with iostat -xm 5 for a non-busy system avg-cpu: %user %nice %system %iowait %steal %idle 22.84 0.00 1.00 0.01 0.00 76.14 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 0.01 15.67 0.04 4.42 0.00 0.08 36.28 0.01 2.27 0.22 0.10 dm-0 0.00 0.00 0.77 0.56 0.00 0.00 8.00 0.01 4.89 0.36 0.05 dm-1 0.00 0.00 0.05 20.09 0.00 0.08 8.03 0.12 5.73 0.05 0.10
  • 12. Examine Disk IO with iostat -xm 5 for a busy system avg-cpu: %user %nice %system %iowait %steal %idle 86.20 0.00 13.50 0.00 0.10 0.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80 dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 15.53 4.00 1.36 dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44 Is %idle low?
  • 13. Examine Disk IO with iostat -xm 5 for a busy system avg-cpu: %user %nice %system %iowait %steal %idle 16.20 0.00 83.50 0.00 0.10 0.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80 dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 15.53 4.00 1.36 dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44 Is %system higher than %user?
  • 14. Examine Disk IO with iostat -xm 5 for a busy system avg-cpu: %user %nice %system %iowait %steal %idle 16.20 0.00 83.50 0.00 0.10 0.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 30.00 2.40 8.20 1.00 0.15 0.01 36.00 0.05 5.78 3.04 2.80 dm-0 0.00 0.00 0.20 3.20 0.00 0.01 8.00 0.05 35.53 4.00 81.36 dm-1 0.00 0.00 38.00 0.00 0.15 0.00 8.00 0.17 4.49 0.38 1.44 Is a device being used more than others?
  • 15. Examine Disk IO with iostat -xm 5 for a busy system avg-cpu: %user %nice %system %iowait %steal %idle 16.20 0.00 83.50 0.00 0.10 0.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 30.00 2.40 8.20 1.0 0.15 0.01 36.00 0.05 5.78 3.04 2.80 dm-0 0.00 0.00 0.20 63.2 0.00 0.01 8.00 0.05 35.53 4.00 81.36 dm-1 0.00 0.00 38.00 0.0 0.15 0.00 8.00 0.17 4.49 0.38 1.44 Are the w/s high while the wMB/s is low?
  • 16. Examine Disk IO with iostat -xm 5 for a busy system avg-cpu: %user %nice %system %iowait %steal %idle 16.20 0.00 83.50 0.00 0.10 0.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vda 30.00 2.40 8.20 1.0 0.15 0.01 36.00 0.05 5.78 3.04 2.80 dm-0 0.00 0.00 0.20 63.2 0.00 0.01 8.00 0.05 35.53 4.00 81.36 dm-1 0.00 0.00 38.00 0.0 0.15 0.00 8.00 0.17 4.49 0.38 1.44 Is await high for a device?
  • 17. P R O F I L E R S A M P L I N G • Sampling-based profilers are the most common kind of profiler. • Because of their relatively low profile, sampling profilers introduce fewer measurement artifacts. • Different sampling profiles behave differently; each may be better for a particular application. Sampling profilers probe the program counter at regular intervals using operating system interrupts. Sampling profilers are less accurate but facilitate a near normal execution time.
  • 18. S A M P L I N G main() prog() s() con()
  • 19. S A M P L I N G main() prog() s() con()
  • 20. S A M P L I N G main() prog() s() con()
  • 21. S A M P L I N G main() prog() s() con()
  • 22. S A M P L I N G main() prog() s() con()
  • 23. S A M P L I N G main() prog() s() con()
  • 24. S A M P L I N G main() prog() s() con()
  • 25. S A M P L I N G S A F E P O I N T S Sampling profilers in Java can only take the sample of a thread when the thread is at a safepoint—essentially, whenever it is allocating memory.
  • 26. P R O F I L E R I N S T R U M E N TAT I O N • Instrumented profilers yield more information about an application, but can possibly have a greater effect on the application than a sampling profiler. • Instrumented profilers should be set up to instrument small sections of the code—a few classes or packages. That limits their impact on the application’s performance. Instrumented profiler adds additional instructions in the code to gather data about what was executed, when, for how long, etc.
  • 27. I N S T R U M E N TAT I O N I M PA C T Instrumented code may change the execution profile. For example, the JVM will inline small methods so that no method invocation is needed when the small-method code is executed. The compiler makes that decision based on the size of the code; depending on how the code is instrumented, it may no longer be eligible to be inlined. This may cause the instrumented profiler to overestimate the contribution of certain methods. And inlining is just one example of a decision that the compiler makes based on the layout of the code; in general, the more the code is instrumented (changed), the more likely it is that its execution profile will change.
  • 28. I N S T R U M E N T E D main() prog() s() con()
  • 29. I N S T R U M E N T E D main() prog() s() con()
  • 30. I N S T R U M E N T E D main() prog() s() con()
  • 31. I N S T R U M E N T E D main() prog() s() con()
  • 32. I N S T R U M E N T E D main() prog() s() con()
  • 33. I N S T R U M E N T E D main() prog() s() con()
  • 34. I N S T R U M E N T E D main() prog() s() con()
  • 35. I N S T R U M E N T E D main() prog() s() con()
  • 36. I N S T R U M E N T E D main() prog() s() con()
  • 37. I N S T R U M E N T E D main() prog() s() con()
  • 38. I N S T R U M E N T E D main() prog() s() con()
  • 39. I N S T R U M E N T E D main() prog() s() con()
  • 40. I N S T R U M E N T E D main() prog() s() con()
  • 41. I N S T R U M E N T E D main() prog() s() con()
  • 42. I N S T R U M E N T E D main() prog() s() con() The thing to notice is that there is so much instrumentation that it is potentially greater than the con() but since it is added to con() that method appears to have greater impact.
  • 43. P R O F I L E T H E C P U F I R S T • CPU time is the first thing to examine when looking at performance of an application. • The goal in optimizing code is to drive the CPU usage up (for a shorter period of time), not down. • Understand why CPU usage is low before diving in and attempting to tune an application.
  • 44. P R O F I L E T H E C P U F I R S T In the heat of battle, in can be tough to choose your targets. I’m sympathetic to that. You see lots of garbage collections with a big heap, you want to profile the memory right away! But I’m asking you… no, I’m begging you. For the love of Java. People. Profile the CPU. The CPU. This CPU right here! Profile the CPU first!
  • 45. L I M I T WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void waste() { 21 for (Long count = 0L; count < 500_000_000; count++) { 22 value += count; 23 } 24 }
  • 46. S TA R T L I M I T WA S T E W I T H A G E N T AT TA C H E D $ java -agentpath:libyjpagent.jnilib LimitWaste [YourKit Java Profiler 2015 build 15042] Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log Press enter to continue.
  • 47. Y O U R K I T J AVA P R O F I L E R
  • 48. Y O U R K I T - C H O O S E A P P L I C AT I O N
  • 49. Y O U R K I T - S TA R T S W I T H S TA C K T E L E M E T RY
  • 50. Y O U R K I T - S TA R T S A M P L I N G
  • 51. C O N T I N U E P R O C E S S I N G O F L I M I T WA S T E $ java -agentpath:libyjpagent.jnilib LimitWaste [YourKit Java Profiler 2015 build 15042] Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log Press enter to continue. 124999999750000000 after 7827.359 ms Press enter to finish.
  • 52. Y O U R K I T - S T O P S A M P L I N G
  • 53. Y O U R K I T - A N A LY Z E C A L L T R E E
  • 54. L I M I T WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void waste() { 21 for (Long count = 0L; count < 500_000_000; count++) { 22 value += count; 23 } 24 }
  • 55. L I M I T A L L O C AT I O N WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void waste() { 21 for (Long count = 0L; count < 500_000_000; Long.valueOf(count + 1)) { 22 value = Long.valueOf(value + count); 23 } 24 }
  • 56.
  • 57.
  • 58. Y O U R K I T - P E R F C H A R T F O R G C
  • 59. Y O U R K I T - P E R F C H A R T F O R A L L O C AT I O N
  • 60. L I M I T A L L O C AT I O N WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void waste() { 21 for (Long count = 0L; count < 500_000_000; Long.valueOf(count + 1)) { 22 value = Long.valueOf(value + count); 23 } 24 }
  • 61. L I M I T A L L O C AT I O N WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void lessWaste() { 21 for (long count = 0; count < 500_000_000; count++) { 22 value = Long.valueOf(value + count); 23 } 24 }
  • 62. L I M I T WA S T E I M P R O V E D $ java -agentpath:libyjpagent.jnilib LimitWaste [YourKit Java Profiler 2015 build 15042] Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log Press enter to continue. 124999999750000000 after 14833.461 ms Press enter to continue. 124999999750000000 after 8551.391 ms Press enter to finish.
  • 63. Y O U R K I T - L I M I T WA S T E I M P R O V E D
  • 64. L I M I T A L L O C AT I O N WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void lessWaste() { 21 for (long count = 0; count < 500_000_000; count++) { 22 value = Long.valueOf(value + count); 23 } 24 }
  • 65. L I M I T A L L O C AT I O N WA S T E E X A M P L E static volatile Long value = 0L; … 20 private static void haste() { 21 long fastValue = 0L; 22 for (long count = 0; count < 500_000_000; count++) { 23 fastValue += count; 24 } 25 value = fastValue; 26 }
  • 66. L I M I T WA S T E - M A K E H A S T E $ java -agentpath:libyjpagent.jnilib LimitWaste [YourKit Java Profiler 2015 build 15042] Log file: /Users/jyoakum/.yjp/log/LimitWaste-4096.log Press enter to continue. 124999999750000000 after 14833.461 ms Press enter to continue. 124999999750000000 after 8551.391 ms Press enter to continue. 124999999750000000 after 266.119 ms Press enter to finish.
  • 67. Y O U R K I T - L I M I T WA S T E - M A K E H A S T E
  • 68. Y O U R K I T - L I M I T WA S T E - M A K E H A S T E
  • 69. T H R E A D P R O F I L I N G • Thread profiling is concerned with examining the different thread states. • If threads are blocked most of the time then execution power is reduced.
  • 70. T H R E A D P R O F I L I N G E X A M P L E ExecutorService execSvc = Executors.newFixedThreadPool(200); for (int i = 0; i < 1000; i++) { execSvc.execute(new SortingThread()); } execSvc.shutdown(); execSvc.awaitTermination(5, TimeUnit.MINUTES);
  • 71. T H R E A D P R O F I L I N G E X A M P L E class SortingThread implements Runnable { @Override public void run() { System.out.println("starting..."); int arraySize = 300_000; int[] bigArray = new int[arraySize]; // populate the array with random numbers for (int i = 0; i < arraySize; i++) { bigArray[i] = ThreadLocalRandom.current().nextInt(50_000); } Arrays.sort(bigArray); System.out.println("finished!"); } }
  • 72. T H R E A D P R O F I L I N G E X A M P L E $ java -agentpath:libyjpagent.jnilib ThreadExample [YourKit Java Profiler 2015 build 15042] Log file: /Users/jyoakum/.yjp/log/ThreadExample-90362.log Press enter to continue. starting… … finished! Complete after 9041.103 ms Press enter to finish.
  • 73. T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T The key thing to take notice of here is that the percent of time under run() only adds up to 56%. Leaving 43% as unaccounted…
  • 74. T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T
  • 75. T H R E A D P R O F I L I N G E X A M P L E - Y O U R K I T
  • 76. T H R E A D P R O F I L I N G E X A M P L E - J M C • JMC (Java Mission Control) • Low overhead - built into the JVM • Commercial feature that requires license agreements for production use
  • 77. T H R E A D P R O F I L I N G E X A M P L E - J M C $ java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder ThreadExample Press enter to continue. starting… … finished! Complete after 4965.916 ms Press enter to finish.
  • 78. T H R E A D P R O F I L I N G E X A M P L E - J M C
  • 79. T H R E A D P R O F I L I N G E X A M P L E - J M C
  • 80. T H R E A D P R O F I L I N G E X A M P L E - J M C
  • 81. T H R E A D P R O F I L I N G E X A M P L E - J M C
  • 82. T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L • Originally used a pool size of 200 threads. • Using a pool size of 40 threads results in nearly the same run time and some other benefits.
  • 83. T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L Before we had multiple threads blocked. Now we have are waiting to create threads.
  • 84. T H R E A D P R O F I L I N G E X A M P L E - S M A L L E R P O O L Before we used nearly 256 MB of heap. Now we used just over 128 MB of heap.
  • 85. M I C R O B E N C H M A R K S public void doTest() { double d; long then = System.currentTimeMillis(); for (int i = 0; i < nLoops; i++) { d = fib(15); } long now = System.currentTimeMillis(); System.out.println(
 "Elapsed time: " + (now - then)); } private double fib(int n) { if (n < 0) {
 throw new IllegalArgumentException(
 "Must be > 0");
 } if (n == 0) { return 0.0d; } if (n == 1) { return 1.0d; } double d = fib(n - 2) + fib(n - 1); if (Double.isInfinite(d)) {
 throw new ArithmeticException("Overflow");
 } return d; }
  • 86. M I C R O B E N C H M A R K S M U S T U S E T H E I R R E S U LT S A smart compiler will end up executing this code: long then = System.currentTimeMillis(); long now = System.currentTimeMillis(); System.out.println("Elapsed time: " + (now - then)); Avoid compiler optimizations: • Read each result. • Use volatile instance variables. There is a way around that particular issue: ensure that each result is read, not simply written. In practice, changing the definition of i from a local variable to an instance variable (declared with the volatile keyword) will allow the performance of the method to be measured.
  • 87. WA R M - U P P E R I O D For microbenchmarks, a warm-up period is required; otherwise, the microbenchmark is measuring the performance of compilation rather than the code it is attempting to measure.
  • 88. M A C R O B E N C H M A R K S No test can give comparable results to examining an application in production. The best thing to use to measure performance of an application “is the application itself, in conjunction with any external resources it uses. If the application normally checks the credentials of a user by making LDAP calls, it should be tested in that mode. Stubbing out the LDAP calls may make sense for module-level testing, but the application must be tested in its full configuration.
  • 89. S U M M A RY • When to profile • Profiler Sampling • Profiler Instrumentation • Where to Start • Examples • Micro vs Macro Benchmarking Yes, it is the same slide as the agenda slide.