SlideShare a Scribd company logo
jvm/java: towards lock-
free concurrency
Arvind Kalyan
Engineer at LinkedIn
agenda
intro to concurrency & memory model on jvm
reordering -> barriers -> happens-before
jdk 8 concurrency primitives
volatile -> atomics, collections, explicit locks, fork/join
trends in this area (to make all of this practical)
lock-free, STM, TM
background
for control and performance, sometimes there are valid
reasons to use locks (like a mutex) for concurrency control
in most other situations, primitive synchronization
constructs in some modules lead to unreliable & incorrect
programs in most non-trivial systems that are composed
over such modules
the best practice, in the current state, is to write single
threaded programs
‘automatic’ concurrency
there are platforms that take your single
threaded program and run it concurrently —
most web servers do this, for example
on the other hand, there are times when you
really must use multiple threads
practicality
concurrency control techniques have been studied for a
while, but since 2005 it is being studied intensely* to make
it more practical for more widespread (and safer) use
simpler software techniques, and also hardware level
support for those techniques are being developed
before we see how to write safe code using these new
techniques, let’s look into some basics
* https://scholar.google.com/scholar?as_ylo=2005&q=%22software+transactional+memory%22
why concurrency control?
when dealing with multiple threads,
concurrency control/synchronization is
necessary not only to guard critical sections
from multiple threads using a mutex…
but also to ensure that the memory updates
(through mutable variables) are made visible
to all threads ‘correctly’
memory model
as a platform, jvm guarantees that ‘correctly
synchronized’ programs have a very well
defined memory behavior
let’s look into the jvm memory model which
defines those guarantees
memory model
your code manipulates memory by using variables and
objects
the memory is separated by a few layers of caches from
the cpu
on a multi-core cpu when a write happens in one cpu’s
cache, we need to make it visible to other cpus as well
and then there is the topic of re-odering…
* http://en.wikipedia.org/wiki/Memory_barrier
memory model
to improve performance, the hardware (cpu,
caches, …) reorders memory access using its
own memory model (set of rules)* dynamically
the visibility of a value in a memory location is
further complicated by the code reordering
performed by the compiler statically
http://en.wikipedia.org/wiki/Memory_ordering
memory model
the static and dynamic reordering strive to
ensure an ‘as-if serial’ semantics
i.e., the program appears to be executing
sequentially as per the lines in your source
code
memory model
memory reordering is transparent in single-
threaded use-cases because of that as-if-
serial guarantee
but logic quickly falls apart and causes
surprises in incorrectly synchronized multi-
threaded programs
memory model
while jvm’s OOTA safety (out of thin air)
guarantees that a thread always reads a value
written by *some* thread, and not some value
out of thin air…
with all the reordering, it’s good to have a
slightly stronger guarantee …
the need for memory barriers
in the following code, say reader is called after writer
(from different threads)

class Reordering {

int x = 0, y = 0;

public void writer() {

x = 1;

y = 2;

}

public void reader() {

int r1 = y;

int r2 = x;

// use r1 and r2

}

}
in reader, even if r1 == 2, r2 can be 0 or 1
synchronization is needed if we want to control the
ordering (and ensure r2 == 1) using a memory barrier
memory barrier
the jvm memory model essentially defines the
relationship between the variables in your
code
the semantics also define a partial ordering on
the memory operations so certain actions are
guaranteed to ‘happen before’ others
happens-before
happens-before is a visibility guarantee for
memory provided through synchronization
such as locking, volatiles, atomics, etc
…and for completeness, through Thread
start() & join()
Concurrency control on jvm with
JDK 8
with that background, let’s look at some
specific tools & mechanisms available on the
jvm & jdk 8..
Concurrency control on jvm with
JDK 8
volatiles
atomics
concurrent collections/data-structures
synchronizers
fork/join framework
volatiles
volatiles are typically used as a state variables
across threads
writing to & reading from a volatile is like releasing
and acquiring a monitor (lock), respectively
i.e., it guarantees a happens-before relationship
not just with other volatile but also non-volatile
memory
volatiles
typical use of volatiles with reader and writer called from
different threads:

class VolatileExample {

int x = 0;

volatile boolean v = false;

public void writer() {

x = 42;

v = true;

}

public void reader() {

if (v == true) {

//uses x - guaranteed to see 42.

}

}

}
the happens-before guarantee in jvm memory model makes it
simpler to reason about the value in x, even though x is non-
volatile!
code: https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
volatiles
guaranteeing happens-before relationship for
non-volatile memory is a performance
overhead, so like any other synchronization
primitive, it must be used judiciously
but, it greatly simplifies the program and by
aligning the dynamic and static reordering with
most programmers’ expectations
atomics
atomics* extend the notion of volatiles, and support
conditional updates
being an extension to volatiles, they guarantee
happens-before relationship on memory operations
the updates are performed through a CAS cpu
instruction
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/package-summary.html
atomics
atomics/cas allow designing non-blocking
algorithms where the critical section is around
a single variable
if there is more than one variable, other forms
of synchronization is needed
CAS
JDK 8 uses CAS for ‘lock-free’ operation
at a high-level, it piggy backs on a cpu
provided CAS* instruction —like lock:cmpxchg on
x86
let’s see how jvm dynamically improves the
performance of the hardware provided CAS
*CAS: http://en.wikipedia.org/wiki/Compare-and-swap
CAS/atomics
CAS in recent cpu implementations don’t assert the lock# to gain
exclusive bus access, but rather rely on efficient cache-coherence
protocols* — unless the memory address is not cache-line aligned
even if that helps CAS to scale on many-core systems, CAS still
adds a lot to local latency, sometimes nearly halting the cpu
to address that local latency, a biased-locking* approach is used
— where uncontended usage of atomics are recompiled
dynamically to not use CAS instructions!
* more about MESI: https://courses.engr.illinois.edu/cs232/sp2009/lectures/x24.pdf

* biased locking in jvm: https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
biased-locking
the biased-locking feature in jvm extends
beyond atomics, and generalizes to different
kinds of locking (monitor entry & exit) on the
jvm
atomics
before we move on, JDK 7 also provides
‘weakCompareAndSet’ atomic api, which relaxes the
happens-before ordering guarantee
relaxing the ordering makes it very hard to reason
about the program’s execution so its use is limited to
debugging counters, etc
there are better ways of doing this ‘fast’ — which
brings us to…
adders & accumulators
under high contention, the biased locking would be
spending too much time in lock revocation from a
thread if we used atomics
in these high contention situations, adders* help
gather counts by actively reducing contention, and
‘gather’ the value only when sum() or longValue() is
called
* http://download.java.net/lambda/b78/docs/api/java/util/concurrent/atomic/LongAdder.html
concurrent collections
the JDK also comes with a handful of lock-
free collections
these help in correctly synchronizing larger
data sets than single variables
concurrent collections
ConcurrentHashMap (CHM) uses some of
the concepts listed so far and provides a lock-
free read, and a mostly lock-free write in java 8
relies on a good hashCode to reduce
collisions, after which it reverts to using a lock
for that bin
concurrent collections
CHM — in general — allows concurrent use of
a Map which can be pretty useful especially to
represent a shared ‘mutating’ state, and such
CHM, together with adders for example,
enable concurrent, lock-free, histogram
generation across threads
more about CHM here, ofcourse: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-
summary.html
synchronizers
let’s look at some synchronization primitives…
(a.k.a. ‘source of bugs’)
synchronizers
2 major categories…
coarse-grained locks are usually less
performant, but are easy to code
and, fine-grained locking has potential for
higher performance, but is more error prone
synchronized
synchronized keyword is a coarse grained locking
scheme
you acquire & release locks at method or block level,
typically holding the lock longer than needed
translates directly to jvm synchronization (intrinsic) &
hardware monitor
so its use is currently discouraged (might change in java9)
explicit locks
Locks* enables fine-grained locking
these extend intrinsic locks, and allow unconditional,
polled, timed & interruptible lock acquisition
allow ‘custom’ wait/notify queues (Condition*) on the
same lock
nice features, but …
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html

* http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
explicit locks
developer needs to remember to release locks, so
following style is encouraged:

Lock l = ...;

l.lock();

try {

// access the resource protected by this lock

} finally {

l.unlock();

}
it gets *very* complicated when we have to deal with
more than 1 lock
…source of all kinds of bugs & surprises
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
ReentrantLock
an implementation of Lock described earlier
support fairness policy to deal with lock
starvation — ‘fair’, not ‘fast’
there is nothing special in this lock to make it
‘reentrant’; all intrinsic locks are per-thread and
reentrant, unlike POSIX invocation based locks
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html
a note about reentrancy
reentrancy helps encapsulate locking behavior & helps write
cleaner (oop) concurrent code
in simpler cases (using single ‘resource’ but multiple methods)
this also helps avoid deadlocks:

class A {

synchronized void run(){

//..

}

}

class B extends A {

synchronized void run() {

super.run()

}

}
if intrinsic locks were not reentrant on jvm, the call to
super.run() would be deadlocked
ReentrantLock
ReentrantLock (not reentrancy in general)
has some issues so it must be used with
caution:
causes starvation, and performs poorly when
fairness is used
StampedLock
supports optimistic reads & lock upgrades
is not reentrant — needs the stamp, so not
usable across calls to unknown methods
for internal use in thread safe components,
where you fully understand the data, objects
& methods involved
StampedLock
for very short read-only code, optimistic
reads improve throughput by reducing
contention
useful when reading multiple fields of an
object from memory without locking
must call validate() later to ensure consistency
StampedLock
along with optimistic reads, the lock upgrade
capability enables many useful idioms:

StampedLock sl = new StampedLock();

double x, y;

..

double distanceFromOrigin() { // A read-only method

long stamp = sl.tryOptimisticRead();

double currentX = x, currentY = y; // read without locking

if (!sl.validate(stamp)) {

stamp = sl.readLock(); // upgrade to read-lock if values are dirty

try {

currentX = x;

currentY = y;

} finally {

sl.unlockRead(stamp);

}

}

return Math.sqrt(currentX * currentX + currentY * currentY);

}
fork/join
unlike regular java.lang.Thread (which are mostly
based on POSIX threads), fork/join tasks never
‘block’
for simple tasks, the overhead of constructing and/or
managing a thread is more expensive than the task
itself
programming on fork/join, in essence, allows
frameworks to optimize such tasks ‘behind the scenes’
fork/join
going beyond performance, the framework does
nothing to ensure concurrency control
the framework is also only usable in a few
scenarios where task can be easily disintegrated
in a sense, this is not making it easier to create
correct (and fast) programs
lambdas & streams
framework available on jdk 8 for data-
processing workloads
looks ‘functional’ — but due to type-erasure
these aren't typed
‘look’ like anonymous inner class but are
fundamentally different from the ground-up —
enabling jvm optimizations for concurrency & gc
lock-free
we’ve looked at a few lock-free concepts at a
single-variable level, using CAS
and atomics, which rely on CAS
and optimizations to make CAS faster…
lock-free
but how do we write ‘real-world’ concurrent
applications using lock-free concepts?
i.e., more than just CAS?
lock-free
that brings us to software transactional
memory (STM)!
STM is to concurrency control, what garbage-
collection is to memory management
STM
brings DB transaction concept to regular
memory access
read & write ‘as-if’ there is no contention…
during commit time the system ensures sanity
under the hood
… no locks in the code!
STM
in low contention use-cases (i.e., well-
designed programs), the absence of
synchronization makes execution very fast!
even in poorly designed programs, the
absence of locks makes it easier to focus on
correctness
STM implementation
multiverse[1] is a popular jvm implementation of
STM (groovy and Scala/Akka use it in their STM)
in essence, multiverse implements multiversion
concurrency control (MVCC[2])
Clojure has a language built-in STM feature
[1] http://multiverse.codehaus.org/overview.html 

[2] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
STM & composability
the biggest benefit of STM is composability (software
reuse)

class Account {

private final TxnRef<Date> lastUpdate = …;

private final TxnInteger balance = …;

public void incBalance(int amount, Date date){

atomic(new Runnable() {

public void run(){

balance.inc(amount);

lastUpdate.set(date);

if(balance.get() < 0) {

throw new IllegalStateException("Not enough money");

}

}

});

}

}

class Teller {

state void transfer(Account from, Account to, int amount) {

atomic(new Runnable() {

public void run() {

Date date = new Date();

from.incBalance(-amount, date);

to.incBalance(amount, date);

}

});

}

}
STM & composability
the Teller class is able to ‘compose’ over other
atomic operations without knowing their internal
details (i.e., what locks they use to synchronize)
so if to.incBalance() fails, the memory effects of
from.incBalance() are not committed so will
never be visible to other threads!
this is a pretty big deal…
Simplicity
STM makes composing concurrent software
modules appear very trivial
in the absence of locks, it is easier to
conceptualize the code flow
the ability to code atomic operations this way
essentially nullifies the challenges typically
associated with concurrent programming
performance
as stated earlier, stm allows optimistic execution: ‘as
though’ there are no other threads running, so it
increases concurrency
STM synchronizes only when required and falls back to
slower (serialized) executions when necessary
STM performs better than explicit locks as the number
of cores increase beyond 4*
* http://en.wikipedia.org/wiki/Software_transactional_memory

http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
more performance
apart from just software improvements, cpu
makers have started looking into hardware
support for TM
this is an emerging area and more advances are
being made, apart from Haswell, and TSX from
Intel
* https://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell

* http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
STM & Practicality
concurrent programming is getting more
practical
stm brings the benefits of fine-grained locking
to coarse-grained locking without using locks
Summary
lock-free concurrency control techniques like
STM not only make it easier to write correct
code…
but also allows platforms (like JVM) to make
your code correct code run faster
References
Being a long slideshow with dense content,
I’ve put references on each slide so you can
read through
Reach out to me on LinkedIn if you’d like more
info or just to discuss!

More Related Content

What's hot

Xilkernel
XilkernelXilkernel
Xilkernel
Vincent Claes
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
Silvio Cesare
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
Abhra Basak
 
Multithreading models
Multithreading modelsMultithreading models
Multithreading models
anoopkrishna2
 

What's hot (20)

Concurrency in Java
Concurrency in JavaConcurrency in Java
Concurrency in Java
 
Xilkernel
XilkernelXilkernel
Xilkernel
 
Java8 - Under the hood
Java8 - Under the hoodJava8 - Under the hood
Java8 - Under the hood
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101
 
Java Multithreading Using Executors Framework
Java Multithreading Using Executors FrameworkJava Multithreading Using Executors Framework
Java Multithreading Using Executors Framework
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
 
Seminar
SeminarSeminar
Seminar
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java ApplicationsEfficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java Applications
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architecture
 
Free FreeRTOS Course-Task Management
Free FreeRTOS Course-Task ManagementFree FreeRTOS Course-Task Management
Free FreeRTOS Course-Task Management
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
OS_Ch5
OS_Ch5OS_Ch5
OS_Ch5
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Threads (operating System)
Threads (operating System)Threads (operating System)
Threads (operating System)
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread pools
 
Kernel
KernelKernel
Kernel
 
Multithreading models
Multithreading modelsMultithreading models
Multithreading models
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Networking threads
Networking threadsNetworking threads
Networking threads
 
Operating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programmingOperating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programming
 

Viewers also liked

Lock free algorithms
Lock free algorithmsLock free algorithms
Lock free algorithms
Pan Ip
 
Memory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual MachineMemory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual Machine
white paper
 

Viewers also liked (20)

Lock free algorithms
Lock free algorithmsLock free algorithms
Lock free algorithms
 
50 nouvelles choses que l'on peut faire en Java 8
50 nouvelles choses que l'on peut faire en Java 850 nouvelles choses que l'on peut faire en Java 8
50 nouvelles choses que l'on peut faire en Java 8
 
Memory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual MachineMemory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual Machine
 
Java SE 8 for Java EE developers
Java SE 8 for Java EE developersJava SE 8 for Java EE developers
Java SE 8 for Java EE developers
 
Streams and collectors in action
Streams and collectors in actionStreams and collectors in action
Streams and collectors in action
 
Déploiement d'une application Java EE dans Azure
Déploiement d'une application Java EE dans AzureDéploiement d'une application Java EE dans Azure
Déploiement d'une application Java EE dans Azure
 
JFokus 50 new things with java 8
JFokus 50 new things with java 8JFokus 50 new things with java 8
JFokus 50 new things with java 8
 
Java 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java ComparisonJava 8 Streams and Rx Java Comparison
Java 8 Streams and Rx Java Comparison
 
Java 8 concurrency abstractions
Java 8 concurrency abstractionsJava 8 concurrency abstractions
Java 8 concurrency abstractions
 
ArrayList et LinkedList sont dans un bateau
ArrayList et LinkedList sont dans un bateauArrayList et LinkedList sont dans un bateau
ArrayList et LinkedList sont dans un bateau
 
Java Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and Trends
 
Free your lambdas
Free your lambdasFree your lambdas
Free your lambdas
 
Autumn collection JavaOne 2014
Autumn collection JavaOne 2014Autumn collection JavaOne 2014
Autumn collection JavaOne 2014
 
50 new things you can do with java 8
50 new things you can do with java 850 new things you can do with java 8
50 new things you can do with java 8
 
Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...
 
50 new things we can do with Java 8
50 new things we can do with Java 850 new things we can do with Java 8
50 new things we can do with Java 8
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
 
Java Concurrency by Example
Java Concurrency by ExampleJava Concurrency by Example
Java Concurrency by Example
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full story
 
Developing and deploying applications with Spring Boot and Docker (@oakjug)
Developing and deploying applications with Spring Boot and Docker (@oakjug)Developing and deploying applications with Spring Boot and Docker (@oakjug)
Developing and deploying applications with Spring Boot and Docker (@oakjug)
 

Similar to jvm/java - towards lock-free concurrency

Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
Kaniska Mandal
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
JAX London
 
Thread Dump Analysis
Thread Dump AnalysisThread Dump Analysis
Thread Dump Analysis
Dmitry Buzdin
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
The Pillars Of Concurrency
The Pillars Of ConcurrencyThe Pillars Of Concurrency
The Pillars Of Concurrency
aviade
 

Similar to jvm/java - towards lock-free concurrency (20)

Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
 
Memory model
Memory modelMemory model
Memory model
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
Thread Dump Analysis
Thread Dump AnalysisThread Dump Analysis
Thread Dump Analysis
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Java programing considering performance
Java programing considering performanceJava programing considering performance
Java programing considering performance
 
Java Multithreading and Concurrency
Java Multithreading and ConcurrencyJava Multithreading and Concurrency
Java Multithreading and Concurrency
 
Here comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdf
 
Cloud Module 3 .pptx
Cloud Module 3 .pptxCloud Module 3 .pptx
Cloud Module 3 .pptx
 
Multithreading and concurrency in android
Multithreading and concurrency in androidMultithreading and concurrency in android
Multithreading and concurrency in android
 
The Pillars Of Concurrency
The Pillars Of ConcurrencyThe Pillars Of Concurrency
The Pillars Of Concurrency
 
S peculative multi
S peculative multiS peculative multi
S peculative multi
 
Threads
ThreadsThreads
Threads
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Java Threading
Java ThreadingJava Threading
Java Threading
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threads
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
Dosass2
Dosass2Dosass2
Dosass2
 

Recently uploaded

Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
Kamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 

Recently uploaded (20)

Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturing
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
AI for workflow automation Use cases applications benefits and development.pdf
AI for workflow automation Use cases applications benefits and development.pdfAI for workflow automation Use cases applications benefits and development.pdf
AI for workflow automation Use cases applications benefits and development.pdf
 
retail automation billing system ppt.pptx
retail automation billing system ppt.pptxretail automation billing system ppt.pptx
retail automation billing system ppt.pptx
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 

jvm/java - towards lock-free concurrency

  • 1. jvm/java: towards lock- free concurrency Arvind Kalyan Engineer at LinkedIn
  • 2. agenda intro to concurrency & memory model on jvm reordering -> barriers -> happens-before jdk 8 concurrency primitives volatile -> atomics, collections, explicit locks, fork/join trends in this area (to make all of this practical) lock-free, STM, TM
  • 3. background for control and performance, sometimes there are valid reasons to use locks (like a mutex) for concurrency control in most other situations, primitive synchronization constructs in some modules lead to unreliable & incorrect programs in most non-trivial systems that are composed over such modules the best practice, in the current state, is to write single threaded programs
  • 4. ‘automatic’ concurrency there are platforms that take your single threaded program and run it concurrently — most web servers do this, for example on the other hand, there are times when you really must use multiple threads
  • 5. practicality concurrency control techniques have been studied for a while, but since 2005 it is being studied intensely* to make it more practical for more widespread (and safer) use simpler software techniques, and also hardware level support for those techniques are being developed before we see how to write safe code using these new techniques, let’s look into some basics * https://scholar.google.com/scholar?as_ylo=2005&q=%22software+transactional+memory%22
  • 6. why concurrency control? when dealing with multiple threads, concurrency control/synchronization is necessary not only to guard critical sections from multiple threads using a mutex… but also to ensure that the memory updates (through mutable variables) are made visible to all threads ‘correctly’
  • 7. memory model as a platform, jvm guarantees that ‘correctly synchronized’ programs have a very well defined memory behavior let’s look into the jvm memory model which defines those guarantees
  • 8. memory model your code manipulates memory by using variables and objects the memory is separated by a few layers of caches from the cpu on a multi-core cpu when a write happens in one cpu’s cache, we need to make it visible to other cpus as well and then there is the topic of re-odering… * http://en.wikipedia.org/wiki/Memory_barrier
  • 9. memory model to improve performance, the hardware (cpu, caches, …) reorders memory access using its own memory model (set of rules)* dynamically the visibility of a value in a memory location is further complicated by the code reordering performed by the compiler statically http://en.wikipedia.org/wiki/Memory_ordering
  • 10. memory model the static and dynamic reordering strive to ensure an ‘as-if serial’ semantics i.e., the program appears to be executing sequentially as per the lines in your source code
  • 11. memory model memory reordering is transparent in single- threaded use-cases because of that as-if- serial guarantee but logic quickly falls apart and causes surprises in incorrectly synchronized multi- threaded programs
  • 12. memory model while jvm’s OOTA safety (out of thin air) guarantees that a thread always reads a value written by *some* thread, and not some value out of thin air… with all the reordering, it’s good to have a slightly stronger guarantee …
  • 13. the need for memory barriers in the following code, say reader is called after writer (from different threads)
 class Reordering {
 int x = 0, y = 0;
 public void writer() {
 x = 1;
 y = 2;
 }
 public void reader() {
 int r1 = y;
 int r2 = x;
 // use r1 and r2
 }
 } in reader, even if r1 == 2, r2 can be 0 or 1 synchronization is needed if we want to control the ordering (and ensure r2 == 1) using a memory barrier
  • 14. memory barrier the jvm memory model essentially defines the relationship between the variables in your code the semantics also define a partial ordering on the memory operations so certain actions are guaranteed to ‘happen before’ others
  • 15. happens-before happens-before is a visibility guarantee for memory provided through synchronization such as locking, volatiles, atomics, etc …and for completeness, through Thread start() & join()
  • 16. Concurrency control on jvm with JDK 8 with that background, let’s look at some specific tools & mechanisms available on the jvm & jdk 8..
  • 17. Concurrency control on jvm with JDK 8 volatiles atomics concurrent collections/data-structures synchronizers fork/join framework
  • 18. volatiles volatiles are typically used as a state variables across threads writing to & reading from a volatile is like releasing and acquiring a monitor (lock), respectively i.e., it guarantees a happens-before relationship not just with other volatile but also non-volatile memory
  • 19. volatiles typical use of volatiles with reader and writer called from different threads:
 class VolatileExample {
 int x = 0;
 volatile boolean v = false;
 public void writer() {
 x = 42;
 v = true;
 }
 public void reader() {
 if (v == true) {
 //uses x - guaranteed to see 42.
 }
 }
 } the happens-before guarantee in jvm memory model makes it simpler to reason about the value in x, even though x is non- volatile! code: https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
  • 20. volatiles guaranteeing happens-before relationship for non-volatile memory is a performance overhead, so like any other synchronization primitive, it must be used judiciously but, it greatly simplifies the program and by aligning the dynamic and static reordering with most programmers’ expectations
  • 21. atomics atomics* extend the notion of volatiles, and support conditional updates being an extension to volatiles, they guarantee happens-before relationship on memory operations the updates are performed through a CAS cpu instruction * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/package-summary.html
  • 22. atomics atomics/cas allow designing non-blocking algorithms where the critical section is around a single variable if there is more than one variable, other forms of synchronization is needed
  • 23. CAS JDK 8 uses CAS for ‘lock-free’ operation at a high-level, it piggy backs on a cpu provided CAS* instruction —like lock:cmpxchg on x86 let’s see how jvm dynamically improves the performance of the hardware provided CAS *CAS: http://en.wikipedia.org/wiki/Compare-and-swap
  • 24. CAS/atomics CAS in recent cpu implementations don’t assert the lock# to gain exclusive bus access, but rather rely on efficient cache-coherence protocols* — unless the memory address is not cache-line aligned even if that helps CAS to scale on many-core systems, CAS still adds a lot to local latency, sometimes nearly halting the cpu to address that local latency, a biased-locking* approach is used — where uncontended usage of atomics are recompiled dynamically to not use CAS instructions! * more about MESI: https://courses.engr.illinois.edu/cs232/sp2009/lectures/x24.pdf
 * biased locking in jvm: https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
  • 25. biased-locking the biased-locking feature in jvm extends beyond atomics, and generalizes to different kinds of locking (monitor entry & exit) on the jvm
  • 26. atomics before we move on, JDK 7 also provides ‘weakCompareAndSet’ atomic api, which relaxes the happens-before ordering guarantee relaxing the ordering makes it very hard to reason about the program’s execution so its use is limited to debugging counters, etc there are better ways of doing this ‘fast’ — which brings us to…
  • 27. adders & accumulators under high contention, the biased locking would be spending too much time in lock revocation from a thread if we used atomics in these high contention situations, adders* help gather counts by actively reducing contention, and ‘gather’ the value only when sum() or longValue() is called * http://download.java.net/lambda/b78/docs/api/java/util/concurrent/atomic/LongAdder.html
  • 28. concurrent collections the JDK also comes with a handful of lock- free collections these help in correctly synchronizing larger data sets than single variables
  • 29. concurrent collections ConcurrentHashMap (CHM) uses some of the concepts listed so far and provides a lock- free read, and a mostly lock-free write in java 8 relies on a good hashCode to reduce collisions, after which it reverts to using a lock for that bin
  • 30. concurrent collections CHM — in general — allows concurrent use of a Map which can be pretty useful especially to represent a shared ‘mutating’ state, and such CHM, together with adders for example, enable concurrent, lock-free, histogram generation across threads more about CHM here, ofcourse: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package- summary.html
  • 31. synchronizers let’s look at some synchronization primitives… (a.k.a. ‘source of bugs’)
  • 32. synchronizers 2 major categories… coarse-grained locks are usually less performant, but are easy to code and, fine-grained locking has potential for higher performance, but is more error prone
  • 33. synchronized synchronized keyword is a coarse grained locking scheme you acquire & release locks at method or block level, typically holding the lock longer than needed translates directly to jvm synchronization (intrinsic) & hardware monitor so its use is currently discouraged (might change in java9)
  • 34. explicit locks Locks* enables fine-grained locking these extend intrinsic locks, and allow unconditional, polled, timed & interruptible lock acquisition allow ‘custom’ wait/notify queues (Condition*) on the same lock nice features, but … * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
 * http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
  • 35. explicit locks developer needs to remember to release locks, so following style is encouraged:
 Lock l = ...;
 l.lock();
 try {
 // access the resource protected by this lock
 } finally {
 l.unlock();
 } it gets *very* complicated when we have to deal with more than 1 lock …source of all kinds of bugs & surprises * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
  • 36. ReentrantLock an implementation of Lock described earlier support fairness policy to deal with lock starvation — ‘fair’, not ‘fast’ there is nothing special in this lock to make it ‘reentrant’; all intrinsic locks are per-thread and reentrant, unlike POSIX invocation based locks * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html
  • 37. a note about reentrancy reentrancy helps encapsulate locking behavior & helps write cleaner (oop) concurrent code in simpler cases (using single ‘resource’ but multiple methods) this also helps avoid deadlocks:
 class A {
 synchronized void run(){
 //..
 }
 }
 class B extends A {
 synchronized void run() {
 super.run()
 }
 } if intrinsic locks were not reentrant on jvm, the call to super.run() would be deadlocked
  • 38. ReentrantLock ReentrantLock (not reentrancy in general) has some issues so it must be used with caution: causes starvation, and performs poorly when fairness is used
  • 39. StampedLock supports optimistic reads & lock upgrades is not reentrant — needs the stamp, so not usable across calls to unknown methods for internal use in thread safe components, where you fully understand the data, objects & methods involved
  • 40. StampedLock for very short read-only code, optimistic reads improve throughput by reducing contention useful when reading multiple fields of an object from memory without locking must call validate() later to ensure consistency
  • 41. StampedLock along with optimistic reads, the lock upgrade capability enables many useful idioms:
 StampedLock sl = new StampedLock();
 double x, y;
 ..
 double distanceFromOrigin() { // A read-only method
 long stamp = sl.tryOptimisticRead();
 double currentX = x, currentY = y; // read without locking
 if (!sl.validate(stamp)) {
 stamp = sl.readLock(); // upgrade to read-lock if values are dirty
 try {
 currentX = x;
 currentY = y;
 } finally {
 sl.unlockRead(stamp);
 }
 }
 return Math.sqrt(currentX * currentX + currentY * currentY);
 }
  • 42. fork/join unlike regular java.lang.Thread (which are mostly based on POSIX threads), fork/join tasks never ‘block’ for simple tasks, the overhead of constructing and/or managing a thread is more expensive than the task itself programming on fork/join, in essence, allows frameworks to optimize such tasks ‘behind the scenes’
  • 43. fork/join going beyond performance, the framework does nothing to ensure concurrency control the framework is also only usable in a few scenarios where task can be easily disintegrated in a sense, this is not making it easier to create correct (and fast) programs
  • 44. lambdas & streams framework available on jdk 8 for data- processing workloads looks ‘functional’ — but due to type-erasure these aren't typed ‘look’ like anonymous inner class but are fundamentally different from the ground-up — enabling jvm optimizations for concurrency & gc
  • 45. lock-free we’ve looked at a few lock-free concepts at a single-variable level, using CAS and atomics, which rely on CAS and optimizations to make CAS faster…
  • 46. lock-free but how do we write ‘real-world’ concurrent applications using lock-free concepts? i.e., more than just CAS?
  • 47. lock-free that brings us to software transactional memory (STM)! STM is to concurrency control, what garbage- collection is to memory management
  • 48. STM brings DB transaction concept to regular memory access read & write ‘as-if’ there is no contention… during commit time the system ensures sanity under the hood … no locks in the code!
  • 49. STM in low contention use-cases (i.e., well- designed programs), the absence of synchronization makes execution very fast! even in poorly designed programs, the absence of locks makes it easier to focus on correctness
  • 50. STM implementation multiverse[1] is a popular jvm implementation of STM (groovy and Scala/Akka use it in their STM) in essence, multiverse implements multiversion concurrency control (MVCC[2]) Clojure has a language built-in STM feature [1] http://multiverse.codehaus.org/overview.html 
 [2] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
  • 51. STM & composability the biggest benefit of STM is composability (software reuse)
 class Account {
 private final TxnRef<Date> lastUpdate = …;
 private final TxnInteger balance = …;
 public void incBalance(int amount, Date date){
 atomic(new Runnable() {
 public void run(){
 balance.inc(amount);
 lastUpdate.set(date);
 if(balance.get() < 0) {
 throw new IllegalStateException("Not enough money");
 }
 }
 });
 }
 }
 class Teller {
 state void transfer(Account from, Account to, int amount) {
 atomic(new Runnable() {
 public void run() {
 Date date = new Date();
 from.incBalance(-amount, date);
 to.incBalance(amount, date);
 }
 });
 }
 }
  • 52. STM & composability the Teller class is able to ‘compose’ over other atomic operations without knowing their internal details (i.e., what locks they use to synchronize) so if to.incBalance() fails, the memory effects of from.incBalance() are not committed so will never be visible to other threads! this is a pretty big deal…
  • 53. Simplicity STM makes composing concurrent software modules appear very trivial in the absence of locks, it is easier to conceptualize the code flow the ability to code atomic operations this way essentially nullifies the challenges typically associated with concurrent programming
  • 54. performance as stated earlier, stm allows optimistic execution: ‘as though’ there are no other threads running, so it increases concurrency STM synchronizes only when required and falls back to slower (serialized) executions when necessary STM performs better than explicit locks as the number of cores increase beyond 4* * http://en.wikipedia.org/wiki/Software_transactional_memory
 http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
  • 55. more performance apart from just software improvements, cpu makers have started looking into hardware support for TM this is an emerging area and more advances are being made, apart from Haswell, and TSX from Intel * https://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell
 * http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
  • 56. STM & Practicality concurrent programming is getting more practical stm brings the benefits of fine-grained locking to coarse-grained locking without using locks
  • 57. Summary lock-free concurrency control techniques like STM not only make it easier to write correct code… but also allows platforms (like JVM) to make your code correct code run faster
  • 58. References Being a long slideshow with dense content, I’ve put references on each slide so you can read through Reach out to me on LinkedIn if you’d like more info or just to discuss!