14. Branch Prediction
● Performance of an if-statement depends on whether its
condition has a predictable pattern.
● A “bad” true-false pattern can make an if-statement up
to six times slower than a “good” pattern!
15. Doing string concatenation in one scope will
be picked by javac and replaced with
StringBuilder equivalent.
String concatenation example
17. Intrinsics
Intrinsics are methods KNOWN to JIT. Bytecodes of those
are ignored and native most performant versions for target
platform is used...
● System::arraycopy
● String::equals
● Math::*
● Object::hashcode
● Object::getClass
● Unsafe::*
18. Escape Analysis
Any object that is not escaping its creation
scope MAY be optimized to stack allocation.
Mostly Lambdas, Anonymous classes,
DateTime, String Builders, Optionals etc...
22. Conclusion
Before attempting to “optimize” something in low level, make
sure you understand what the environment is already
optimizing for you…
Dont try to predict the performance (especially low-level
behavior) of your program by looking at the bytecode. When
the JIT Compiler is done with it, there will not be much
similarities left.
27. Cache access latencies
CPUs are getting faster not by frequency but by lower latency between L
caches, better cache coherency protocols and smart optimizations.
28. Why Concurrency is HARD?
Problem 1 : VISIBILITY!
● Any processor can temporarily store some values to L
caches instead of Main memory, thus other processor
might not see changes made by first processor…
● Also if processor works for some time with L caches it
might not see changes made by other processor right
away...
31. JMM (Java Memory Model)
Java Memory model is set of rules and
guidelines which allows Java programs to
behave deterministically across multiple
memory architecture, CPU, and operating
systems.
38. Conclusions on Volatile
● Volatile guarantees that changes made by one thread is visible
to other thread.
● Guarantees that read/write to volatile field is never reordered
(instructions before and after can be reordered).
● Volatile without additional synchronization is enough if you
have only one writer to the volatile field, if there are more
than one you need to synchronize...
58. IMPORTANT!
Sometimes horizontal scaling is cheaper. Developing hardware friendly code is hard, it breaks easy if
new developer does not understand existing code base or new version of JVM does some optimizations
you never expect (happens a lot), it's hard to test, If your product needs higher throughput, you either
make it more efficient or scale. When cost of scaling is too high then it makes perfect sense to make the
system more efficient (assuming you don't have fundamentally inefficient system).
If you’re scaling your product and a single node on highest load utilizes low percentage of its resources
(CPU, Memory etc…) then you have a not efficient system.
Developing hardware friendly code is all about efficiency, on most systems you might NEVER
need to go low level, but knowledge of low level semantics of your environment will enable you to
write more efficient code by default.
And most important NEVER EVER optimize without
BENCHMARKING!!!
60. Example of Disrupter useage : Log4j2
In the test with 64 threads, asynchronous loggers are 12 times faster than
asynchronous appenders, and 68 times faster than synchronous loggers.
61. Why?
● Generally any traditional queue is in one of two states : either its filling
up, or it’s draining.
● Most queues are unbounded : and any unbounded queue is a
potential OOM source.
● Queues are writing to the memory : put and pull… and writes are
expensive. During a write queue is locked (or partially locked).
● Queues are best way to create CONTENTION! thats what often is the
bottleneck of the system.
63. What is it all about Disruptor?
● Non blocking. A write does not lock consumers, and consumers work in
parallel, with controlled access to data in the queue, and without
CONTENTION!
● GC Free : Disruptor does not create any objects at all, instead it pre
allocates all the memory programmatically predefined for it.
● Disruptor is bounded.
● Cache friendly. (Mechanical sympathy)
● Its hardware friendly. Disruptor uses all the low level semantics of JMM
to achieve maximum performance/latency.
● One thread per consumer.
88. Disruptor (Cons)
● Not as trivial as ABQ (or other queues)
● Reasonable limit for busy threads (consumers)
● Not a drop in replacement, it different approach to queues
96. And some stuff about high performance Java code
● https://www.youtube.com/watch?v=NEG8tMn36VQ
● https://www.youtube.com/watch?v=t49bfPLp0B0
● http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java
● https://www.youtube.com/watch?v=ih-IZHpxFkY
98. Coming next
Concurrency : Level 1
Concurrency primitives provided by language SDK. Everything that
provides manual control over concurrency.
- package java.util.concurrent.*
- Future
- CompletableFuture
- Phaser
- ForkJoinPool (in Java 8), ForkJoinTask, CountedCompleters
Concurrency : Level 2
High level approach to concurrency, when library or framework handles
concurrent execution of the code... (will cover only RxJava although
there is a bunch of other good stuff)
- Functional Programming approach (high order functions)
- Optional
- Streams
- Reactive Programming (RxJava)