Understanding the Disruptor

In Terminator 3 - Rise of the Machines, bare metal comes back to haunt humanity, ruthlessly crushing all resistance. This keynote is here to warn you that the same thing is happening to Java and the JVM! Java was designed in a world where there were a wide range of hardware platforms to support. Its premise of Write Once Run Anywhere (WORA) proved to be one of the compelling reasons behind Java's dominance (even if the reality didn't quite meet the marketing hype). However, this WORA property means that Java and the JVM struggled to utilise specialist hardware and operating system features that could make a massive difference in the performance of your application. This problem has recently gotten much, much worse. Due to the rise of multi-core processors, massive increases in main memory and enhancements to other major hardware components (e.g. SSD), the JVM is now distant from utilising that hardware, causing some major performance and scalability issues! Kirk Pepperdine and Martijn Verburg will take you through the complexities of where Java meets the machine and loses. They'll give up some of their hard-won insights on how to work around these issues so that you can plan to avoid termination, unlike some of the poor souls that ran into the T-800...

Presented to the London Java Community on the 11th October 2011.

Technology Design

Understanding the Disruptor

A Beginner's Guide to Hardcore Concurrency

Ordering

Program Order: Execution Order (maybe):

int w = 10; int x = 20;
int x = 20; int y = 30;
int y = 30; int b = x * y;
int z = 40;
int w = 10;
int a = w + z; int z = 40;
int b = x * y; int a = w + z;

Increment a Counter

static long foo = 0;

private static void increment() {
for (long l = 0; l < 500000000L; l++) {
foo++;
}
}

Using a Lock

public static long foo = 0;
public static Lock lock = new Lock();

private static void increment() {
for (long l = 0; l < 500000000L; l++) {
lock.lock();
try {
foo++;
} finally {
lock.unlock();
}
}
}

Using an AtomicLong

static AtomicLong foo = new AtomicLong(0);

private static void increment() {
for (long l = 0; l < 500000000L; l++) {
foo.getAndIncrement();
}
}

The Cost of Contention
Increment a counter 500 000 000 times.

● One Thread : 300 ms

The Cost of Contention
Increment a counter 500 000 000 times.

● One Thread : 300 ms
● One Thread (volatile): 4 700 ms (15x)

The Cost of Contention
Increment a counter 500 000 000 times.

● One Thread : 300 ms
● One Thread (volatile): 4 700 ms (15x)
● One Thread (Atomic) : 5 700 ms (19x)

The Cost of Contention
Increment a counter 500 000 000 times.

● One Thread : 300 ms
● One Thread (volatile): 4 700 ms (15x)
● One Thread (Atomic) : 5 700 ms (19x)
● One Thread (Lock) : 10 000 ms (33x)

Parallel v. Serial - String Splitting

Guy Steele @ Strangle Loop:

http://www.infoq.com/presentations/Thinking-Parallel-
Programming

Scala Implementation and Brute Force version in Java:

https://github.com/mikeb01/folklore/

Performance Test

Parallel (Scala): 440 ops/sec
Serial (Java) : 1768 ops/sec

CPUs Are Getting Faster

Single threaded string split on different CPUs

How Fast Is It - Latency

ABQ Disruptor

Min Latency (ns) 145 29

Mean Latency (ns) 32 757 52

99 Percentile Latency (ns) 2 097 152 128

99.99 Percentile Latency (ns) 4 194 304 8 192

Max Latency (ns) 5 069 086 175 567

Ordering and Visibility

private static final int SIZE = 32;
private final Object[] data = new Object[SIZE];
private volatile long sequence = -1;
private long nextValue = -1;

public void publish(Object value) {
long index = ++nextValue;
data[(int)(index % SIZE)] = value;
sequence = index;
}

public Object get(long index) {
if (index <= sequence) {
return data[(int)(index % SIZE)];
}
return null;
}

Ordering and Visibility - Store

mov $0x1,%ecx
add 0x18(%rsi),%rcx ;*ladd
;...
lea (%r12,%r8,8),%r11 ;*getfield data
;...
mov %r12b,(%r11,%r10,1)
mov %rcx,0x10(%rsi)
lock addl $0x0,(%rsp) ;*ladd

Ordering and Visibility - Load

mov %eax,-0x6000(%rsp)
push %rbp
sub $0x20,%rsp ;*synchronization entry
; - RingBuffer::get@-1 (line 17)
mov 0x10(%rsi),%r10 ;*getfield sequence
; - RingBuffer::get@2 (line 17)
cmp %r10,%rdx
jl 0x00007ff92505f22d ;*iflt
; - RingBuffer::get@6 (line 17)
mov %edx,%r11d ;*l2i ; - RingBuffer::get@14 (line 19)

Look Ma' No Memory Barrier

AtomicLong sequence = new AtomicLong(-1);

public void publish(Object value) {
long index = ++nextValue;
data[(int)(index % SIZE)] = value;
sequence.lazySet(index);
}

Cache Line Padding

public class PaddedAtomicLong extends AtomicLong {

public volatile long p1, p2, p3, p4, p5, p6 = 7L;

//... lines omitted

public long sumPaddingToPreventOptimisation() {
return p1 + p2 + p3 + p4 + p5 + p6;
}
}

In Summary

● Concurrency is a tool
● Ordering and visibility are the key challenges
● For performance the details matter
● Don't believe everything you read
○ Come up with your own theories and test them!

What's hot

Concurrency: Rubies, plural

ehuard

Concurrency: Rubies, Plural

Eleanor McHugh

Java and the machine - Martijn Verburg and Kirk Pepperdine

JAX London

Rust "Hot or Not" at Sioux

nikomatsakis

Preparation for mit ose lab4

Benux Wei

Rust: Reach Further

nikomatsakis

Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.

Next Generation Indexes For Big Data Engineering (ODSC East 2018)

Engineering fast indexes (Deepdive)

Linux-Permission

Colin Su

Engineering fast indexes

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth.

Java Concurrency Idioms

Alex Miller

MongoUK 2011 - Rplacing RabbitMQ with MongoDB

Boxed Ice

MessagePack - An efficient binary serialization format

Larry Nung

Fast indexes with roaring #gomtl-10

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mn7lIO. Aaron Turon explains Rust's core notion of “ownership” and shows how Rust uses it to guarantee thread safety, amongst other things. He also talks about how Rust goes beyond addressing the pitfalls of C++ to do something even more exciting: unlock a new generation of systems programmers by providing a safe, high-level experience -- while never compromising on performance. Filmed at qconsf.com. Aaron Turon manages the Rust team at Mozilla Research. His background is in low-level concurrency and programming language design.

About memcached

ChangQi Lin

Rust: Unlocking Systems Programming

C4Media

Advanced locking

Dr. C.V. Suresh Babu

Fedora Virtualization Day: Linux Containers & CRIU

Andrey Vagin

Rust Synchronization Primitives

Corey Richardson

Dthreads is an efﬁcient deterministic multithreading system for unmodiﬁed C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. It is easy to use: just link your program with -ldthread instead of -lpthread. Dthreads can be downloaded from its source code repo on GitHub (https://github.com/plasma-umass/dthreads). A technical paper describing Dthreads appeared at SOSP 2012 (https://github.com/plasma-umass/dthreads/blob/master/doc/dthreads-sosp11.pdf?raw=true). Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This talk presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks we evaluated, matches and occasionally exceeds the performance of pthreads.

Dthreads: Efficient Deterministic Multithreading

Emery Berger

What's hot (20)

Concurrency: Rubies, plural

Concurrency: Rubies, Plural

Java and the machine - Martijn Verburg and Kirk Pepperdine

Rust "Hot or Not" at Sioux

Preparation for mit ose lab4

Rust: Reach Further

Next Generation Indexes For Big Data Engineering (ODSC East 2018)

Engineering fast indexes (Deepdive)

Linux-Permission

Engineering fast indexes

Java Concurrency Idioms

MongoUK 2011 - Rplacing RabbitMQ with MongoDB

MessagePack - An efficient binary serialization format

Fast indexes with roaring #gomtl-10

About memcached

Rust: Unlocking Systems Programming

Advanced locking

Fedora Virtualization Day: Linux Containers & CRIU

Rust Synchronization Primitives

Dthreads: Efficient Deterministic Multithreading

Similar to Understanding the Disruptor

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...

Charles Nutter

JavaOne 2012 - JVM JIT for Dummies

Charles Nutter

The Ring programming language version 1.5.4 book - Part 25 of 185

While threads have become a first class citizen in C++ since C++11, it is not always the case that they are the best abstraction to express parallelism where the objective is to speed up computations. OpenMP is a parallelism API for C/C++ and Fortran that has been around for a long time. Intel's Threading Building Blocks (TBB) is only a little bit more than 10 years old, but is very mature, and specifically for C++. Mats will introduce OpenMP and TBB and their use in modern C++ and provide some best practices for them as well as try to predict what the C++ standard has in store for us when it comes to parallelism in the future.

Options and trade offs for parallelism and concurrency in Modern C++

Satalia

Embrace the dark side. As a developer you'll often be advised that writing concurrent code should be the purview of the genius coders alone. In this talk Michael Barker will discard that notion into the cesspits of logic and reason and attempt to present on the less understood area of non-blocking concurrency, i.e. concurrency without locks. We'll look the modern Intel CPU architecture, why we need a memory model, the performance costs of various non-blocking constructs and delve into the implementation details of the latest version of the Disruptor to see how non-blocking concurrency can be applied to build high performance data structures.

Locks? We Don't Need No Stinkin' Locks - Michael Barker

JAX London

Lock? We don't need no stinkin' locks!

Michael Barker

The Ring programming language version 1.3 book - Part 84 of 88

We present FGen, a program generator for high performance convolution operations (finite-impulse-response filters). The generator uses an internal mathematical DSL to enable structural optimization at a high level of abstraction. We use FGen as a testbed to demonstrate how to provide modular and extensible support for modern SIMD vector architectures in a DSL-based generator. Specifically, we show how to combine staging and generic programming with type classes to abstract over both the data type (real or complex) and the target architecture (e.g., SSE or AVX) when mapping DSL expressions to C code with explicit vector intrinsics. Benchmarks shows that the generated code is highly competitive with commercial libraries.

Abstracting Vector Architectures in Library Generators: Case Study Convolutio...

ETH Zurich

Javascript engine performance

Duoyi Wu

Actor Concurrency

Alex Miller

Reverse Engineering Dojo: Enhancing Assembly Reading Skills

Asuka Nakajima

SubScript - это расширение языка Scala, добавляющее поддержку конструкций и синтаксиса аглебры общающихся процессов (Algebra of Communicating Processes, ACP). SubScript является перспективным расширением, применимым как для разработки высоконагруженных параллельных систем, так и для простых персональных приложений.

Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...

GeeksLab Odessa

Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...

Víctor Bolinches

The Ring programming language version 1.5.3 book - Part 87 of 184

Why learn Internals?

Shaul Rosenzwieg

The Ring programming language version 1.5.3 book - Part 25 of 184

The Ring programming language version 1.7 book - Part 83 of 196

Pythran is a an ahead of time compiler that turns modules written in a large subset of Python into C++ meta-programs that can be compiled into efficient native modules. It targets mainly compute intensive part of the code, hence it comes as no surprise that it focuses on scientific applications that makes extensive use of Numpy. Under the hood, Pythran inter-procedurally analyses the program and performs high level optimizations and parallel code generation. Parallelism can be found implicitly in Python intrinsics or Numpy operations, or explicitly specified by the programmer using OpenMP directives directly in the Python source code. Either way, the input code remains fully compatible with the Python interpreter. While the idea is similar to Parakeet or Numba, the approach differs significantly: the code generation is not performed at runtime but offline. Pythran generates C++11 heavily templated code that makes use of the NT2 meta-programming library and relies on any standard-compliant compiler to generate the binary code. We propose to walk through some examples and benchmarks, exposing the current state of what Pythran provides as well as the limit of the approach.

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014

PyData

Python高级编程（二）

Qiangning Hong

Hadoop Summit Europe 2014: Apache Storm Architecture

P. Taylor Goetz

Similar to Understanding the Disruptor (20)

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...

JavaOne 2012 - JVM JIT for Dummies

The Ring programming language version 1.5.4 book - Part 25 of 185

Options and trade offs for parallelism and concurrency in Modern C++

Locks? We Don't Need No Stinkin' Locks - Michael Barker

Lock? We don't need no stinkin' locks!

The Ring programming language version 1.3 book - Part 84 of 88

Abstracting Vector Architectures in Library Generators: Case Study Convolutio...

Javascript engine performance

Actor Concurrency

Reverse Engineering Dojo: Enhancing Assembly Reading Skills

Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...

Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...

The Ring programming language version 1.5.3 book - Part 87 of 184

Why learn Internals?

The Ring programming language version 1.5.3 book - Part 25 of 184

The Ring programming language version 1.7 book - Part 83 of 196

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014

Python高级编程（二）

Hadoop Summit Europe 2014: Apache Storm Architecture

More from Trisha Gee

Career Advice for Architects

Is boilerplate code really so bad?

We know that Code Reviews are a Good Thing. We probably have our own personal lists of things we look for in the code we review, while also fearing what others might say about our code. How to we ensure that code reviews are actually benefiting the team, and the application? How do we decide who does the reviews? What does "done" look like? In this talk, Trisha will identify some best practices to follow. She'll talk about what's really important in a code review, and set out some guidelines to follow in order to maximise the value of the code review and minimise the pain.

Code Review Best Practices

Anyone ever give you advice on how to remain a programmer? To avoid being “promoted” into positions away from technology and code? Anyone ever tell you at school or university that you needed social skills to be a good developer? Did you know, without having had half a dozen different jobs, that all development roles are not created equal? Is it true that moving jobs a lot is a Bad Thing? In this session, Trisha is going to share some lessons she learnt the hard way while managing her career as a Java developer. She's going to tell you secrets that others don’t want to share. And she’ll give you tools for working out what your next steps are. If nothing else, you’ll get to laugh at the (many) mistakes Trisha made in her search for The Perfect Job.

Career Advice for Programmers - ProgNET London

Is Boilerplate Code Really So Bad?

The feature we always hear about whenever Java 9 is in the news is Jigsaw, modularity. But this doesn't scratch the same developer itch that Java 8's lambdas and streams did, and we're left with a vague sensation that the next version might not be that interesting. Java 9 actually has a lot of great additions and changes to make development a bit nicer. These features can't be lumped under an umbrella term like Java 8's lambdas and streams, the changes are scattered throughout the APIs and language features that we regularly use. In this presentation Trisha will show, via live coding: - What the Java Platform Module System is and how to make your code modular - How we can use the new Flow API to utilise Reactive Programming - The improvements to the Streams API that make it easier to control infinite streams - How to the Collections convenience methods simplify code Along the way we'll bump into other Java 9 features, including some of the additions to interfaces and changes to deprecation.

Real World Java 9 - JetBrains Webinar

Real World Java 9

Real World Java 9

Do you know how to remain a programmer? To avoid being “promoted” into positions away from technology and code? Did they teach you at university that you need social skills to be a good developer? What other skills do you need that aren't technical? Did you know that all development roles are not created equal? Is it true that moving jobs a lot is a Bad Thing? In this session, Trisha Gee (Java Champion, 2015 MongoDB Master, 2016/2014/2012 JavaOne Rock Star & Technical Advocate for JetBrains) will share some lessons she learnt the hard way over nearly twenty years of managing her career as a Java developer. She'll talk about what's really important to developers when thinking about their careers, and give you tools for working out what your next steps are. If nothing else, you’ll get to laugh at the (many) mistakes I’ve made in my search for The Perfect Job.

Career Advice for Programmers

See: https://trishagee.github.io/presentation/real_world_java_9/ In this presentation Trisha will show, via live coding: - How we can use the new Flow API to utilise Reactive Programming - How the improvements to the Streams API make it easier to control real-time streaming data - How to the Collections convenience methods simplify code Along the way we’ll bump into other Java 9 features, including some of the additions to interfaces and changes to deprecation. We’ll see that once you start using Java 9, you can’t go back to Before.

Real World Java 9

”It’s all about Containers, Serverless and Reactive Programming right now! ProgSCon London will explore these trends with leading industry experts. Several talks will also feature Blockchain, Microservices and Big Data.” You’re here at ProgSCon to hear all about the latest trends in technology, to learn about them and decide which ones to apply and figure out how. But it’s a tall order, learning to be a fully buzzword compliant developer, architect or lead, especially when What’s Hot changes on practically a daily basis. During this talk, Trisha will give an irreverent overview of the current technical landscape and present a survival guide for those who want to stay ahead in this turbulent industry. See also: https://trishagee.github.io/presentation/becoming_buzzword_compliant/

Becoming fully buzzword compliant

The feature we always hear about whenever Java 9 is in the news is Jigsaw, modularity. But this doesn't scratch the same developer itch that Java 8's lambdas and streams did, and we're left with a vague sensation that the next version might not be that interesting. Java 9 actually has a lot of great additions and changes to make development a bit nicer. These features can't be lumped under an umbrella term like Java 8's lambdas and streams, the changes are scattered throughout the APIs and language features that we regularly use. In this presentation Trisha will show, via live coding: - How we can use the new Flow API to utilise Reactive Programming - How the improvements to the Streams API make it easier to control real-time streaming data - How to the Collections convenience methods simplify code Along the way we'll bump into other Java 9 features, including some of the additions to interfaces and changes to deprecation. We’ll see that once you start using Java 9, you can't go back to Before.

Real World Java 9 (QCon London)

Java 9 Functionality and Tooling

Java 8 and 9 in Anger

Refactoring to Java 8 (Devoxx BE)

Migrating to IntelliJ IDEA from Eclipse

A code review is basically a technical discussion which should lead to improvements in the code and/or sharing knowledge in a team. As with any conversation, it should have substance and form. What’s involved in a good code review? What kind of problems do we want to spot and address? Trisha Gee will talk about things a reviewer may consider when looking at changes: what potential issues to look for; why certain patterns may be harmful; and, of course, what NOT to look at. But when it comes to commenting on someone’s work, it may be hard to find the right words to convey a useful message without offending the authors - after all, this is something that they worked hard on. Maria Khalusova will share some observations, thoughts and practical tricks on how to give and receive feedback without turning a code review into a battlefield.

Code Review Matters and Manners

While we’re drawing ever closer to Java 9, and even hearing about features in Java 10, it’s also true that many of us are still working with an older version. Even if your project has technically adopted Java 8, and even if you’re using it when coding new features, it’s likely the majority of your code base is still not making the most of what’s available in Java 8 - features like Lambda Expressions, the Streams API, and new Date/Time. In this presentation, Trisha: - Highlights some of the benefits of using Java 8 - after all, you’ll probably have to persuade The Management that tampering with existing code is worthwhile - Demonstrates how to identify areas of code that can be updated to use Java 8 features - Shows how to automatically refactor your code to make use of features like lambdas and streams. - Covers some of the pros and cons of using the new features - including suggestions of when refactoring may NOT be the best idea.

Refactoring to Java 8 (QCon New York)

- Highlight the performance benefits of using Java 8 - after all, you’ll probably have to persuade "The Management" that tampering with existing code is worthwhile - Demonstrate how to identify areas of code that can be updated to use Java 8 features, and how to pick which changes will give you the most benefit - Demonstrate how to automatically refactor your code to make use of features like lambdas and streams - Cover some of the pros and cons of using the new features - including suggestions of when refactoring may NOT be the best idea.

Refactoring to Java 8 (Devoxx UK)

Staying Ahead of the Curve