SlideShare a Scribd company logo
1 of 98
Download to read offline
Java
Under the hood
Javac and JVM optimizations
Agenda
● Javac and JVM optimizations
○ JIT (Just In Time Compilation)
■ Profiling, Method Binding, Safepoints
○ Method Inlining,
○ Loop Unrolling,
○ Lock Coarsening
○ Lock Eliding,
○ Branch Prediction,
○ Escape Analysis
○ OSR (On Stack Replacement)
○ TLAB (Thread Local Allocation Buffers)
Java programm lifetime
JIT Compilation
Method Inlining
Loop Unrolling
Loop Unrolling
Lock Coarsening
Lock Eliding
Branch Prediction
Branch Prediction
Branch Prediction
● Performance of an if-statement depends on whether its
condition has a predictable pattern.
● A “bad” true-false pattern can make an if-statement up
to six times slower than a “good” pattern!
Doing string concatenation in one scope will
be picked by javac and replaced with
StringBuilder equivalent.
String concatenation example
String concatenation example
Intrinsics
Intrinsics are methods KNOWN to JIT. Bytecodes of those
are ignored and native most performant versions for target
platform is used...
● System::arraycopy
● String::equals
● Math::*
● Object::hashcode
● Object::getClass
● Unsafe::*
Escape Analysis
Any object that is not escaping its creation
scope MAY be optimized to stack allocation.
Mostly Lambdas, Anonymous classes,
DateTime, String Builders, Optionals etc...
Escape analysis
TLAB (Thread Local Allocation Buffers)
How to “see” JIT activity? - JitWatch
Conclusion
Before attempting to “optimize” something in low level, make
sure you understand what the environment is already
optimizing for you…
Dont try to predict the performance (especially low-level
behavior) of your program by looking at the bytecode. When
the JIT Compiler is done with it, there will not be much
similarities left.
Questions?
Concurrency : Level 0
Agenda
● Concurrency : Hardware level
○ CPU architecture evolution
○ Cache Coherency Protocols
○ Memory Barriers
○ Store Buffers
○ Cachelines
○ volatiles, monitors (locks, synchronization), atomics
CPU structure
Cache access latencies
CPUs are getting faster not by frequency but by lower latency between L
caches, better cache coherency protocols and smart optimizations.
Why Concurrency is HARD?
Problem 1 : VISIBILITY!
● Any processor can temporarily store some values to L
caches instead of Main memory, thus other processor
might not see changes made by first processor…
● Also if processor works for some time with L caches it
might not see changes made by other processor right
away...
Why Concurrency is HARD?
Problem 2 : Reordering
Example : Non thread safe
JMM (Java Memory Model)
Java Memory model is set of rules and
guidelines which allows Java programs to
behave deterministically across multiple
memory architecture, CPU, and operating
systems.
Thread safe version (visibility + reordering both solved)
Thread safe version
cpu/x86/vm/c1_LIRGenerator_x86.cpp
Example : Thread safe
Happens Before
Understanding volatile
Conclusions on Volatile
● Volatile guarantees that changes made by one thread is visible
to other thread.
● Guarantees that read/write to volatile field is never reordered
(instructions before and after can be reordered).
● Volatile without additional synchronization is enough if you
have only one writer to the volatile field, if there are more
than one you need to synchronize...
Volatile Write/Read performance
Lazy Singleton (not thread safe)
Lazy Singleton (dumb thread safety)
Lazy Singleton (not thread safe)
Lazy Singleton (still not thread safe)
Lazy Singleton (thread safe yay!)
Happens Before
Lazy Singleton (CL trick)
False sharing (hidden contention)
False Sharing
False Sharing
Monitors
Monitor Operations :
● monitorenter
● monitorexit
● wait
● notify/notifyAll
Monitor States :
● init
● biased
● thin
● fat (inflated)
Cost of Contention
Conclusion
● Volatile reads are not that bad
● Avoid sharing state
● Avoid writing to shared state
● Avoid Contention
Tools
● JMH OpenJDK tool to write correct benchmarks
● JMH Samples
● Jcstress tool to test critical sections of concurrent code
● JOL (Java Object Layout) helps to measure sizes of objects
JMH example
JMH example
Jcstress example
Jcstress sample output
IMPORTANT!
Sometimes horizontal scaling is cheaper. Developing hardware friendly code is hard, it breaks easy if
new developer does not understand existing code base or new version of JVM does some optimizations
you never expect (happens a lot), it's hard to test, If your product needs higher throughput, you either
make it more efficient or scale. When cost of scaling is too high then it makes perfect sense to make the
system more efficient (assuming you don't have fundamentally inefficient system).
If you’re scaling your product and a single node on highest load utilizes low percentage of its resources
(CPU, Memory etc…) then you have a not efficient system.
Developing hardware friendly code is all about efficiency, on most systems you might NEVER
need to go low level, but knowledge of low level semantics of your environment will enable you to
write more efficient code by default.
And most important NEVER EVER optimize without
BENCHMARKING!!!
Disruptor by LMAX
Example of Disrupter useage : Log4j2
In the test with 64 threads, asynchronous loggers are 12 times faster than
asynchronous appenders, and 68 times faster than synchronous loggers.
Why?
● Generally any traditional queue is in one of two states : either its filling
up, or it’s draining.
● Most queues are unbounded : and any unbounded queue is a
potential OOM source.
● Queues are writing to the memory : put and pull… and writes are
expensive. During a write queue is locked (or partially locked).
● Queues are best way to create CONTENTION! thats what often is the
bottleneck of the system.
Queue typical state
What is it all about Disruptor?
● Non blocking. A write does not lock consumers, and consumers work in
parallel, with controlled access to data in the queue, and without
CONTENTION!
● GC Free : Disruptor does not create any objects at all, instead it pre
allocates all the memory programmatically predefined for it.
● Disruptor is bounded.
● Cache friendly. (Mechanical sympathy)
● Its hardware friendly. Disruptor uses all the low level semantics of JMM
to achieve maximum performance/latency.
● One thread per consumer.
Theory : understanding disruptor
Writing to Ring Buffer
Reading from Ring Buffer
Disruptor can coordinate consumers
Lmax architecture
Disruptor (Pros)
● Performance of course
● Holly BATCHING!!!
● Mechanical Sympathy
● Optionally GC Free
● Prevents False Sharing
● Easy to compose dependant consumers (concurrency)
● Synchronization free code in consumers
● Data Structure (not a frickin framework!!!)
● Fits werry well with CQRS and ES
Disruptor (Pros)
● Thread affinity (for more performance/throughput)
● Different strategies for Consumers (busy spin, sleep)
● Single/Multiple producer strategy
Avoid useless processing (disrupter can batch)
Disruptor (Cons)
● Not as trivial as ABQ (or other queues)
● Reasonable limit for busy threads (consumers)
● Not a drop in replacement, it different approach to queues
Disruptor Implementation (simplified : single writer)
No locks at all ( Atomic.lazySet )
Why power of 2?
Ring Buffer customizations
● Producer strategies
○ Single producer
○ Multiple producer
● Wait Strategies
○ Sleeping Wait
○ Yielding Wait
○ Busy Spin
Resources
JitWatch
Peter Lawrey blog
Aleksey Shipilyov stuff
About TLAB
About Monitors
About Memory Barriers
And some stuff about high performance Java code
● https://www.youtube.com/watch?v=NEG8tMn36VQ
● https://www.youtube.com/watch?v=t49bfPLp0B0
● http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java
● https://www.youtube.com/watch?v=ih-IZHpxFkY
Links for LMAX Disruptor
● https://www.youtube.com/watch?v=DCdGlxBbKU4
● https://www.youtube.com/watch?v=KrWxle6U10M
● https://www.youtube.com/watch?v=IsGBA9KEtTM
● https://www.youtube.com/watch?v=o_nXgoTxBsQ
● http://martinfowler.com/articles/lmax.html
● https://www.youtube.com/watch?v=eTeWxZvlCZ8
Coming next
Concurrency : Level 1
Concurrency primitives provided by language SDK. Everything that
provides manual control over concurrency.
- package java.util.concurrent.*
- Future
- CompletableFuture
- Phaser
- ForkJoinPool (in Java 8), ForkJoinTask, CountedCompleters
Concurrency : Level 2
High level approach to concurrency, when library or framework handles
concurrent execution of the code... (will cover only RxJava although
there is a bunch of other good stuff)
- Functional Programming approach (high order functions)
- Optional
- Streams
- Reactive Programming (RxJava)

More Related Content

What's hot

Hybrid STM/HTM for Nested Transactions on OpenJDK
Hybrid STM/HTM for Nested Transactions on OpenJDKHybrid STM/HTM for Nested Transactions on OpenJDK
Hybrid STM/HTM for Nested Transactions on OpenJDKAntony Hosking
 
Using Flame Graphs
Using Flame GraphsUsing Flame Graphs
Using Flame GraphsIsuru Perera
 
The JVM - Internal ( 스터디 자료 )
The JVM - Internal ( 스터디 자료 )The JVM - Internal ( 스터디 자료 )
The JVM - Internal ( 스터디 자료 )GunHee Lee
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMKris Mok
 
Concurrency patterns in Ruby
Concurrency patterns in RubyConcurrency patterns in Ruby
Concurrency patterns in RubyThoughtWorks
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, UkraineVladimir Ivanov
 
Mitigating overflows using defense in-depth. What can your compiler do for you?
Mitigating overflows using defense in-depth. What can your compiler do for you?Mitigating overflows using defense in-depth. What can your compiler do for you?
Mitigating overflows using defense in-depth. What can your compiler do for you?Javier Tallón
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional MemoryYuuki Takano
 
Notes about concurrent and distributed systems & x86 virtualization
Notes about concurrent and distributed systems & x86 virtualizationNotes about concurrent and distributed systems & x86 virtualization
Notes about concurrent and distributed systems & x86 virtualizationAlessio Villardita
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Ramith Jayasinghe
 
Distributed Transaction Management in Spring & JEE
Distributed Transaction Management in Spring & JEEDistributed Transaction Management in Spring & JEE
Distributed Transaction Management in Spring & JEEMushfekur Rahman
 
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCDiagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCMushfekur Rahman
 
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDevMake Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDevJian-Hong Pan
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 

What's hot (20)

Hybrid STM/HTM for Nested Transactions on OpenJDK
Hybrid STM/HTM for Nested Transactions on OpenJDKHybrid STM/HTM for Nested Transactions on OpenJDK
Hybrid STM/HTM for Nested Transactions on OpenJDK
 
Using Flame Graphs
Using Flame GraphsUsing Flame Graphs
Using Flame Graphs
 
The JVM - Internal ( 스터디 자료 )
The JVM - Internal ( 스터디 자료 )The JVM - Internal ( 스터디 자료 )
The JVM - Internal ( 스터디 자료 )
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
Concurrency patterns in Ruby
Concurrency patterns in RubyConcurrency patterns in Ruby
Concurrency patterns in Ruby
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
 
What your jvm can do for you
What your jvm can do for youWhat your jvm can do for you
What your jvm can do for you
 
Free FreeRTOS Course-Task Management
Free FreeRTOS Course-Task ManagementFree FreeRTOS Course-Task Management
Free FreeRTOS Course-Task Management
 
Mitigating overflows using defense in-depth. What can your compiler do for you?
Mitigating overflows using defense in-depth. What can your compiler do for you?Mitigating overflows using defense in-depth. What can your compiler do for you?
Mitigating overflows using defense in-depth. What can your compiler do for you?
 
Pgq
PgqPgq
Pgq
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional Memory
 
Notes about concurrent and distributed systems & x86 virtualization
Notes about concurrent and distributed systems & x86 virtualizationNotes about concurrent and distributed systems & x86 virtualization
Notes about concurrent and distributed systems & x86 virtualization
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Concurrency - Why it's hard ?
Concurrency - Why it's hard ?
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Distributed Transaction Management in Spring & JEE
Distributed Transaction Management in Spring & JEEDistributed Transaction Management in Spring & JEE
Distributed Transaction Management in Spring & JEE
 
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCDiagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
 
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDevMake Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 

Similar to Java under the hood

Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Managementbasisspace
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking MechanismsKernel TLV
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsSoftware Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsIsuru Perera
 
Arm developement
Arm developementArm developement
Arm developementhirokiht
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance CachingScyllaDB
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaPeter Lawrey
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)Robert Burrell Donkin
 

Similar to Java under the hood (20)

Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Concept of thread
Concept of threadConcept of thread
Concept of thread
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsSoftware Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and Flamegraphs
 
Presentation
PresentationPresentation
Presentation
 
Realtime
RealtimeRealtime
Realtime
 
Arm developement
Arm developementArm developement
Arm developement
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Volatile
VolatileVolatile
Volatile
 
Java memory model
Java memory modelJava memory model
Java memory model
 
Microreboot
MicrorebootMicroreboot
Microreboot
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 

Recently uploaded

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsmeharikiros2
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 

Recently uploaded (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 

Java under the hood

  • 2. Javac and JVM optimizations
  • 3.
  • 4. Agenda ● Javac and JVM optimizations ○ JIT (Just In Time Compilation) ■ Profiling, Method Binding, Safepoints ○ Method Inlining, ○ Loop Unrolling, ○ Lock Coarsening ○ Lock Eliding, ○ Branch Prediction, ○ Escape Analysis ○ OSR (On Stack Replacement) ○ TLAB (Thread Local Allocation Buffers)
  • 14. Branch Prediction ● Performance of an if-statement depends on whether its condition has a predictable pattern. ● A “bad” true-false pattern can make an if-statement up to six times slower than a “good” pattern!
  • 15. Doing string concatenation in one scope will be picked by javac and replaced with StringBuilder equivalent. String concatenation example
  • 17. Intrinsics Intrinsics are methods KNOWN to JIT. Bytecodes of those are ignored and native most performant versions for target platform is used... ● System::arraycopy ● String::equals ● Math::* ● Object::hashcode ● Object::getClass ● Unsafe::*
  • 18. Escape Analysis Any object that is not escaping its creation scope MAY be optimized to stack allocation. Mostly Lambdas, Anonymous classes, DateTime, String Builders, Optionals etc...
  • 20. TLAB (Thread Local Allocation Buffers)
  • 21. How to “see” JIT activity? - JitWatch
  • 22. Conclusion Before attempting to “optimize” something in low level, make sure you understand what the environment is already optimizing for you… Dont try to predict the performance (especially low-level behavior) of your program by looking at the bytecode. When the JIT Compiler is done with it, there will not be much similarities left.
  • 25. Agenda ● Concurrency : Hardware level ○ CPU architecture evolution ○ Cache Coherency Protocols ○ Memory Barriers ○ Store Buffers ○ Cachelines ○ volatiles, monitors (locks, synchronization), atomics
  • 27. Cache access latencies CPUs are getting faster not by frequency but by lower latency between L caches, better cache coherency protocols and smart optimizations.
  • 28. Why Concurrency is HARD? Problem 1 : VISIBILITY! ● Any processor can temporarily store some values to L caches instead of Main memory, thus other processor might not see changes made by first processor… ● Also if processor works for some time with L caches it might not see changes made by other processor right away...
  • 29. Why Concurrency is HARD? Problem 2 : Reordering
  • 30. Example : Non thread safe
  • 31. JMM (Java Memory Model) Java Memory model is set of rules and guidelines which allows Java programs to behave deterministically across multiple memory architecture, CPU, and operating systems.
  • 32. Thread safe version (visibility + reordering both solved)
  • 38. Conclusions on Volatile ● Volatile guarantees that changes made by one thread is visible to other thread. ● Guarantees that read/write to volatile field is never reordered (instructions before and after can be reordered). ● Volatile without additional synchronization is enough if you have only one writer to the volatile field, if there are more than one you need to synchronize...
  • 40. Lazy Singleton (not thread safe)
  • 41. Lazy Singleton (dumb thread safety)
  • 42. Lazy Singleton (not thread safe)
  • 43. Lazy Singleton (still not thread safe)
  • 47. False sharing (hidden contention)
  • 50. Monitors Monitor Operations : ● monitorenter ● monitorexit ● wait ● notify/notifyAll Monitor States : ● init ● biased ● thin ● fat (inflated)
  • 52. Conclusion ● Volatile reads are not that bad ● Avoid sharing state ● Avoid writing to shared state ● Avoid Contention
  • 53. Tools ● JMH OpenJDK tool to write correct benchmarks ● JMH Samples ● Jcstress tool to test critical sections of concurrent code ● JOL (Java Object Layout) helps to measure sizes of objects
  • 58. IMPORTANT! Sometimes horizontal scaling is cheaper. Developing hardware friendly code is hard, it breaks easy if new developer does not understand existing code base or new version of JVM does some optimizations you never expect (happens a lot), it's hard to test, If your product needs higher throughput, you either make it more efficient or scale. When cost of scaling is too high then it makes perfect sense to make the system more efficient (assuming you don't have fundamentally inefficient system). If you’re scaling your product and a single node on highest load utilizes low percentage of its resources (CPU, Memory etc…) then you have a not efficient system. Developing hardware friendly code is all about efficiency, on most systems you might NEVER need to go low level, but knowledge of low level semantics of your environment will enable you to write more efficient code by default. And most important NEVER EVER optimize without BENCHMARKING!!!
  • 60. Example of Disrupter useage : Log4j2 In the test with 64 threads, asynchronous loggers are 12 times faster than asynchronous appenders, and 68 times faster than synchronous loggers.
  • 61. Why? ● Generally any traditional queue is in one of two states : either its filling up, or it’s draining. ● Most queues are unbounded : and any unbounded queue is a potential OOM source. ● Queues are writing to the memory : put and pull… and writes are expensive. During a write queue is locked (or partially locked). ● Queues are best way to create CONTENTION! thats what often is the bottleneck of the system.
  • 63. What is it all about Disruptor? ● Non blocking. A write does not lock consumers, and consumers work in parallel, with controlled access to data in the queue, and without CONTENTION! ● GC Free : Disruptor does not create any objects at all, instead it pre allocates all the memory programmatically predefined for it. ● Disruptor is bounded. ● Cache friendly. (Mechanical sympathy) ● Its hardware friendly. Disruptor uses all the low level semantics of JMM to achieve maximum performance/latency. ● One thread per consumer.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71. Writing to Ring Buffer
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 82.
  • 85. Disruptor (Pros) ● Performance of course ● Holly BATCHING!!! ● Mechanical Sympathy ● Optionally GC Free ● Prevents False Sharing ● Easy to compose dependant consumers (concurrency) ● Synchronization free code in consumers ● Data Structure (not a frickin framework!!!) ● Fits werry well with CQRS and ES
  • 86. Disruptor (Pros) ● Thread affinity (for more performance/throughput) ● Different strategies for Consumers (busy spin, sleep) ● Single/Multiple producer strategy
  • 87. Avoid useless processing (disrupter can batch)
  • 88. Disruptor (Cons) ● Not as trivial as ABQ (or other queues) ● Reasonable limit for busy threads (consumers) ● Not a drop in replacement, it different approach to queues
  • 90. No locks at all ( Atomic.lazySet )
  • 92. Ring Buffer customizations ● Producer strategies ○ Single producer ○ Multiple producer ● Wait Strategies ○ Sleeping Wait ○ Yielding Wait ○ Busy Spin
  • 93.
  • 94.
  • 95. Resources JitWatch Peter Lawrey blog Aleksey Shipilyov stuff About TLAB About Monitors About Memory Barriers
  • 96. And some stuff about high performance Java code ● https://www.youtube.com/watch?v=NEG8tMn36VQ ● https://www.youtube.com/watch?v=t49bfPLp0B0 ● http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java ● https://www.youtube.com/watch?v=ih-IZHpxFkY
  • 97. Links for LMAX Disruptor ● https://www.youtube.com/watch?v=DCdGlxBbKU4 ● https://www.youtube.com/watch?v=KrWxle6U10M ● https://www.youtube.com/watch?v=IsGBA9KEtTM ● https://www.youtube.com/watch?v=o_nXgoTxBsQ ● http://martinfowler.com/articles/lmax.html ● https://www.youtube.com/watch?v=eTeWxZvlCZ8
  • 98. Coming next Concurrency : Level 1 Concurrency primitives provided by language SDK. Everything that provides manual control over concurrency. - package java.util.concurrent.* - Future - CompletableFuture - Phaser - ForkJoinPool (in Java 8), ForkJoinTask, CountedCompleters Concurrency : Level 2 High level approach to concurrency, when library or framework handles concurrent execution of the code... (will cover only RxJava although there is a bunch of other good stuff) - Functional Programming approach (high order functions) - Optional - Streams - Reactive Programming (RxJava)