SlideShare a Scribd company logo
HIGH PERFORMANCE SYSTEMS
WITHOUT TEARS
ZAHARI DICHEV
SCALA DAYS BERLIN 2018
EXCESSIVE OBJECT INSTANTIATION
THINGS TO KEEP IN MIND
▸ Actual object instantiation costs time
▸ Creates a lot more work for the Garbage Collector
▸ May introduce systemic pauses in your application
▸ Introduces non-determinism
THINGS TO KEEP IN MIND
▸ A fair amount of systems implement GC-free data
structures living entirely off-heap
▸ Some commercials solutions provide pause-free
GC implementations (Azul Zing)
▸ There are even proposals to introduce a No-Op
GC in Java
EXTRACTOR OBJECTS
EXTRACTORS IN A NUTSHELL
▸ An object with an unapply method
▸ Takes an object and tries to give back its
components
▸ Useful in pattern matching, partial functions, etc
▸ A great tool for achieving brevity and
expressiveness
EXTRACTING BLACK ROOKS
case class Rook(x: Int,
y: Int,
isBlack: Boolean)
▸ We simply need to go through a board of rooks
▸ Match on all rooks that are black
▸ Extract the x and y coordinates
EXTRACTING BLACK ROOKS
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
EXTRACTING BLACK ROOKS
public unapply(Lextractors/Rook;)Lscala/Option;
L0
LINENUMBER 5 L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
NEW scala/Some
DUP
NEW scala/Tuple2$mcII$sp
DUP
. . .
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
EXTRACTING BLACK ROOKS
public unapply(Lextractors/Rook;)Lscala/Option;
L0
LINENUMBER 5 L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
NEW scala/Some
DUP
NEW scala/Tuple2$mcII$sp
DUP
. . .
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
ALLOCATION FREE EXTRACTORS
▸ Name-based extractors introduced in 2.11
▸ Returning Option is no longer needed
▸ We need to return an object defining two methods
def isEmpty: Boolean = . . .
def get: T = . . .
ALLOCATION-FREE EXAMPLE
object BlackRookNameBased {
class Extractor[T <: AnyRef](val extraction: T) extends AnyVal {
def isEmpty: Boolean = extraction eq null
def get: T = extraction
}
def unapply(rook: Rook): Extractor[(Int, Int)] =
if (rook.isBlack)
new Extractor((rook.x, rook.y))
else
new Extractor(null)
}
ALLOCATION-FREE EXAMPLE (BYTECODE)
public unapply(Lextractors/Rook;)Lscala/Tuple2;
L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
L2
NEW scala/Tuple2$mcII$sp
DUP
ALOAD 1
INVOKEVIRTUAL extractors/Rook.x ()I
ALOAD 1
INVOKEVIRTUAL extractors/Rook.y ()I
INVOKESPECIAL scala/Tuple2$mcII$sp.<init> (II)V
GOTO L3
L1
. . .
ALLOCATION-FREE EXAMPLE (BYTECODE)
public unapply(Lextractors/Rook;)Lscala/Tuple2;
L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
L2
NEW scala/Tuple2$mcII$sp
DUP
ALOAD 1
INVOKEVIRTUAL extractors/Rook.x ()I
ALOAD 1
INVOKEVIRTUAL extractors/Rook.y ()I
INVOKESPECIAL scala/Tuple2$mcII$sp.<init> (II)V
GOTO L3
L1
. . .
EXECUTION TIME COMPARISON
Time(ms)
0
100
200
300
400
Board Size
512 1024 2048 4096
369
120
47
24
144
44
2418
Name-based
Default
GC STATS
▸ -XX:+PrintGCApplicationStoppedTime 

Enables printing of the duration of the pause (for
example, a GC pause) that lasted.
▸ -XX:+PrintGCApplicationConcurrentTime

Enables printing of the time elapsed from the last pause
(for example, a GC pause).
▸ -XX:+PrintGCTimeStamps

Enables printing of the duration of the pause (for
example, a GC pause) that lasted.
HEAP STATS (DEFAULT)
num #instances #bytes class name
----------------------------------------------
1: 262144 376704 scala.Tuple2$mcII$sp
2: 262144 6291456 extractors.Rook
3: 197796 3164736 java.lang.Integer
4: 11931 286344 java.lang.String
5: 17592 281472 scala.Some
num #instances #bytes class name
----------------------------------------------
1: 262144 376704 scala.Tuple2$mcII$sp
2: 262144 6291456 extractors.Rook
3: 197796 3164736 java.lang.Integer
4: 11949 286776 java.lang.String
5: 3108 174048 jdk.internal.org.objectweb.asm.Item
HEAP STATS (NAME BASED)
ALL MEMORY IS NOT CREATED EQUAL
HOW LATENCY IS MASKED
▸ Modern CPUs have a multitude of caches
▸ These caches vary in size and latency
▸ Main purpose is to mask latency and ensure our
CPU’s progress is not severely hindered by main
memory latency
▸ A cache miss is one of the most prominent
performance killers
HIERARCHY OF CACHES
▸ L1 cache -  core-local cache split into separate 32K
data and 32K instruction caches. (1.5 ns)
▸ L2 - core-local cache of 256K in size. Contains both
data and instructions . (5 ns)
▸ L3 - Typically 6mb. Shared between cores. (16 - 25 ns)
▸ RAM - Large in size. (60 ns)
MEMORY HIERARCHY EXAMPLE
CORE L1 L2
CORE L1 L2
L3 MAIN MEMORY
MATRIX TRANSPOSITION
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
MATRIX TRANSPOSITION
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
MATRIX TRANSPOSITION
def transpose: Matrix = {
val newMatrix = Matrix.empty(rows)
for {i <- 0 until rows}
for {j <- 0 until cols}
newMatrix.data.update(j + I * rows, this.data(i + j * cols))
newMatrix
}
MATRIX TRANSPOSITION
▸ We have cache and memory
▸ Latency to memory is significantly higher
▸ We load data in cache lines of size 32 bytes (4 longs)
▸ Cache size is one line
def transpose: Matrix = {
val newMatrix = Matrix.empty(rows)
for {i <- 0 until rows}
for {j <- 0 until cols}
newMatrix.data.update(j + I * rows, this.data(i + j * cols))
newMatrix
}
DATA ACCESSES
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
DATA ACCESSES
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
DATA ACCESSES
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
DATA ACCESSES
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
BETTER ACCESS PATTERN
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
BETTER ACCESS PATTERN
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
BETTER ACCESS PATTERN
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
BETTER ACCESS PATTERN
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
TRANSPOSED
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
EXECUTION TIME COMPARISONTime(ms)
0
1750
3500
5250
7000
Matrix size (side)
2048 4096 8192 16384
6 800
1 360
351120
3 100
822
238115
Cache Friendly
Naive
TOOLS TO MONITOR CACHE STATS
▸ Intel V tune 

https://software.intel.com/en-us/intel-vtune-amplifier-xe
▸ Likwid

https://github.com/RRZE-HPC/likwid
▸ Perf Stat

https://perf.wiki.kernel.org
▸ Intel's PCM

https://github.com/opcm/pcm
INTEL OPEN PCM
Core (SKT) | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP
0 0 8184 K 1993 K 0.30 0.43 0.01 0.01 35
1 0 1448 K 1982 K 0.27 0.24 0.01 0.01 35
2 0 4550 K 7100 K 0.36 0.46 0.00 0.00 33
3 0 1191 K 1658 K 0.28 0.34 0.00 0.01 33
4 0 9316 K 12 M 0.24 0.12 0.01 0.01 30
5 0 1042 K 1451 K 0.28 0.26 0.01 0.01 30
6 0 6700 K 9294 K 0.28 0.37 0.00 0.01 25
7 0 1013 K 1527 K 0.34 0.27 0.01 0.01 25
——————————————————————————————————————————————————————————————————————————————————————
INTEL OPEN PCM
Core (SKT) | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP
0 0 8184 K 1993 K 0.30 0.43 0.01 0.01 35
1 0 1448 K 1982 K 0.27 0.24 0.01 0.01 35
2 0 4550 K 7100 K 0.36 0.46 0.00 0.00 33
3 0 1191 K 1658 K 0.28 0.34 0.00 0.01 33
4 0 9316 K 12 M 0.24 0.12 0.01 0.01 30
5 0 1042 K 1451 K 0.28 0.26 0.01 0.01 30
6 0 6700 K 9294 K 0.28 0.37 0.00 0.01 25
7 0 1013 K 1527 K 0.34 0.27 0.01 0.01 25
——————————————————————————————————————————————————————————————————————————————————————
L2 CACHE MISSES
Cachemisees(L2)
0
150
300
450
600
Matrix size (side)
2048 4096 8192 16384
546
176
46
21
268
96
44
19
Cache Friendly
Naive
CACHE OBLIVIOUS ALGORITHMS
▸ A bit of a misleading name…
▸ An algorithm designed to take advantage of the
underlying memory hierarchy
▸ No need to know details of the cache (size, length
of the cache lines, etc.)
▸ Reduces the problem, so that it eventually fits in
cache
SOME KNOWN APPROACHES
▸ Matrix multiplication - Strassen algorithm
▸ Matrix tranposition - Frigo’s transpose
▸ Tree traversal - van Emde Boas layout
▸ Hashing - blocked probing
SYNCHRONISATION HAS ITS PRICE
EXAMPLE FROM THE DAY IN THE LIFE
▸ An event log
▸ Writer - writes events to the log in linear fashion
▸ Transformer - concurrently tails the log and
transforms the events give some predefined
function
▸ Transformer is never ahead of the writer
A SIMPLE EVENT LOG
trait EventLog[T] {
def writeNext(ev: T): Boolean
def transformNext(f: T => T): Boolean
}
A SIMPLE EVENT LOG
var writerPos = 0L
var transfPos = 0L
val log = new Array[Int](logSize)
A SIMPLE EVENT LOG
def writeNext(ev: Int): Boolean = synchronized {
if (writerPos < transfPos) {
false
} else {
log(writerPos.toInt) = ev
writerPos += 1
true
}
}
A SIMPLE EVENT LOG
def transformNext(f: Int Int): Boolean = synchronized {
if (transfPos >= writerPos) {
false
} else {
val currentEvent = log(transfPos.toInt)
log(transfPos.toInt) = f(currentEvent)
transfPos += 1
true
}
}
DELETING ALL SENSITIVE DATA
log.transformNext(_ => 0)
READY FOR GDRP
SYNCHRONIZED STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
LOCK-FREE IMPLEMENTATION
@volatile var writerPos = 0L
@volatile var transfPos = 0L
INSURING MEMORY VISIBILITY
▸ Introduces a happens-before relationship
▸ All changes prior to that have happened and are
visible to other threads
▸ Does not mean that values are read from main
memory
▸ Writes are applied to the L1 cache and flow
through the cache subsystem
LOCK-FREE STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock-free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
GOING ATOMIC
val writerPos = AtomicLong(0L)
val transfPos = AtomicLong(0L)
val log = new Array[Int](logSize)
def writeNext(ev: Int): Boolean = {
val currentWriterPos = writerPos.get
if (currentWriterPos < transfPos.get) {
false
} else {
log(currentWriterPos.toInt) = ev
writerPos.lazySet(currentWriterPos + 1)
true
}
}
LAZY SET STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
FALSE SHARING
▸ Two threads modifying independent variables
sharing the same cache line
▸ Often times depends on the layout of your objects
▸ Causes invalidation of cache lines and increased
coherency protocol traffic
▸ Cache line is ping-pongs through L3 which has
significant latency implications
▸ Can be even worse in case these threads are on a
different socket (crossing interconnects)
ENSURING COHERENCE (MESI)
INVALID
EXCLUSIVE
SHARED
MODIFIED
BR + BW
PR/S
BW
BW PR/~S
BR
PW
BW
PW BR
PW
PR + BR
PR + PW
PR
PR - processor read
PW - processor write
BR - observed bus read
BW - observed bus write
S - shared
~S - not shared
FALSE SHARING
@volatile var writerPos = 0L
@volatile var transfPos = 0L
Cache Line 1
64 Bytes
Cache Line 2
Cache Line 3
Cache Line N
…
FALSE SHARING
@volatile var writerPos = 0L
@volatile var transfPos = 0L
Cache Line 1
64 Bytes
Cache Line 2
Cache Line 3
Cache Line N
…
▸ Inspecting Java Object Layout

https://github.com/ktoso/sbt-jol
PADDING TO AVOID FALSE SHARING
val writerPos = AtomicLong.withPadding(0, LeftRight128)
val transfPos = AtomicLong.withPadding(0, LeftRight128)
PADDED STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
USE CASE FROM REAL LIFE
AKKA MESSAGE LIFECYCLE
Sender
Sends a message
through ActorRef
ActorRef
Actor
Enqueues message
Schedules and runs
the mailbox
Executor Service
T1 T2 Tn
Dispatcher
TYPES OF AKKA DISPATCHERS
▸ Default Dispatcher - Used if no other specified
▸ Pinned Dispatcher - one thread per actor
▸ Calling Thread Dispatcher - for tests only
TYPES OF EXECUTORS
▸ ForkJoinPool - Relies on lock free work stealing
queues
▸ ThreadPoolExecutor - Uses Linked blocking queue
to distribute tasks
THREAD POOL EXECUTOR
External
Component
Submits a task
R1 R2 Rn
T1
Tn
T1
Dequeues and
executes task
LIMITATION: NO ACTOR-TO-THREAD AFFINITY
▸ Potentially causing CPU cache invalidation
▸ Lack of parameters allowing you to achieve fine
grained control
AFFINITY POOL
External
Component
Submits a task
R1 R2 Rn
T1
R1 R2 Rn
T2
R1 R2 Rn
Tn
Queue
selector
Picks the queue
to submit to
Dequeue
tasks
FAIR DISTRIBUTION QUEUE SELECTOR
▸ Adaptive work assignment strategy
▸ Few actors - explicit mapping (fairer)
▸ More actors - consistent hashing (cheaper)
ADVANTAGES
▸ Less cache hits due to temporal locality
▸ Decreases contention
▸ Customisable queue selection
CLIENT ACTOR
class UserQueryActor(latch: CountDownLatch,
numQueries: Int,
numUsersInDB: Int) extends Actor {
private var left = numQueries
private val receivedUsers: mutable.Map[Int, User] = mutable.Map()
private val randGenerator = new Random()
override def receive: Receive = {
case u: User {
receivedUsers.put(u.userId, u)
if (left == 0) {
latch.countDown()
context stop self
} else {
sender() ! Request(randGenerator.nextInt(numUsersInDB))
}
left -= 1
}
}
}
SERVICE ACTOR
class UserServiceActor(userDb: Map[Int, User],
latch: CountDownLatch,
numQueries: Int) extends Actor {
private var left = numQueries
def receive = {
case Request(id)
userDb.get(id) match {
case Some(u) sender() ! u
case None
}
if (left == 0) {
latch.countDown()
context stop self
}
left -= 1
}
}
BENCHMARK RESULTSMsg/s(M)
0
1
2
3
4
5
6
dispatcher.throughput
1 5 50
5,2
3,3
1,4
5
4,4
3,7
5,4
4,84,7
Affinity
Fork Join
Fixed Size
SO… IF YOU ARE ON A HOT CODEPATH
▸ Make sure you really are
▸ Measure everything (sbt-jmh, sbt-jol, perfstat …)
▸ Watch out for language features that can
introduce unintended allocations (e.g. pattern
matching)
▸ Use algorithms and data structures that are cache
friendly
▸ Use efficient concurrency tools but try to not roll
your own: JCTools, Akka, Vert.x …
RESOURCES
▸ Name based extractors

https://hseeberger.wordpress.com/2013/10/04/name-based-extractors-in-scala-2-11/
▸ Cache oblivious algorithms (MIT OCW)

https://www.youtube.com/watch?v=CSqbjfCCLrU
▸ Lazy Set in detail

http://psy-lob-saw.blogspot.bg/2012/12/atomiclazyset-is-performance-win-for.html
▸ Processor Counter Monitor

https://github.com/opcm/pcm
▸ Memory Access Patterns

https://mechanical-sympathy.blogspot.bg/2012/08/memory-access-patterns-are-
important.html
▸ False Sharing

https://mechanical-sympathy.blogspot.bg/2011/07/false-sharing.html

https://mechanical-sympathy.blogspot.bg/2013/02/cpu-cache-flushing-fallacy.html
▸ Slides and code

https://github.com/zaharidichev/scala-days-2018-berlin

More Related Content

What's hot

General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
Priyanka Aash
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
Burak Himmetoglu
 
Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]
RootedCON
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
Grzegorz Kolpuc
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
Ferdinand Jamitzky
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
Abhirup Mallik
 
opt-mem-trx
opt-mem-trxopt-mem-trx
opt-mem-trx
Miguel Gamboa
 
Ntp cheat sheet
Ntp cheat sheetNtp cheat sheet
Ntp cheat sheet
csystemltd
 
Virtual Machine for Regular Expressions
Virtual Machine for Regular ExpressionsVirtual Machine for Regular Expressions
Virtual Machine for Regular Expressions
Alexander Yakushev
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Sages
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
Denys Goldiner
 
Quantum computing - 2021-01-09
Quantum computing - 2021-01-09Quantum computing - 2021-01-09
Quantum computing - 2021-01-09
Aritra Sarkar
 
Petri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and ApplicationsPetri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and Applications
Dr. Mohamed Torky
 
Preparation for mit ose lab4
Preparation for mit ose lab4Preparation for mit ose lab4
Preparation for mit ose lab4
Benux Wei
 
ECMAScript 2015 Tips & Traps
ECMAScript 2015 Tips & TrapsECMAScript 2015 Tips & Traps
ECMAScript 2015 Tips & Traps
Adrian-Tudor Panescu
 
Introduction to RevKit
Introduction to RevKitIntroduction to RevKit
Introduction to RevKit
Mathias Soeken
 
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
Aritra Sarkar
 
Quantum computation: past-now-future - 2021-06-19
Quantum computation: past-now-future - 2021-06-19Quantum computation: past-now-future - 2021-06-19
Quantum computation: past-now-future - 2021-06-19
Aritra Sarkar
 
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
Ehsan Sharifi
 
RecSplit Minimal Perfect Hashing
RecSplit Minimal Perfect HashingRecSplit Minimal Perfect Hashing
RecSplit Minimal Perfect Hashing
Thomas Mueller
 

What's hot (20)

General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
opt-mem-trx
opt-mem-trxopt-mem-trx
opt-mem-trx
 
Ntp cheat sheet
Ntp cheat sheetNtp cheat sheet
Ntp cheat sheet
 
Virtual Machine for Regular Expressions
Virtual Machine for Regular ExpressionsVirtual Machine for Regular Expressions
Virtual Machine for Regular Expressions
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
 
Quantum computing - 2021-01-09
Quantum computing - 2021-01-09Quantum computing - 2021-01-09
Quantum computing - 2021-01-09
 
Petri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and ApplicationsPetri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and Applications
 
Preparation for mit ose lab4
Preparation for mit ose lab4Preparation for mit ose lab4
Preparation for mit ose lab4
 
ECMAScript 2015 Tips & Traps
ECMAScript 2015 Tips & TrapsECMAScript 2015 Tips & Traps
ECMAScript 2015 Tips & Traps
 
Introduction to RevKit
Introduction to RevKitIntroduction to RevKit
Introduction to RevKit
 
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
 
Quantum computation: past-now-future - 2021-06-19
Quantum computation: past-now-future - 2021-06-19Quantum computation: past-now-future - 2021-06-19
Quantum computation: past-now-future - 2021-06-19
 
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
 
RecSplit Minimal Perfect Hashing
RecSplit Minimal Perfect HashingRecSplit Minimal Perfect Hashing
RecSplit Minimal Perfect Hashing
 

Similar to High Performance Systems Without Tears - Scala Days Berlin 2018

Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
Databricks
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
Sri Ambati
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Daniel Lemire
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
Hao Chen
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Basic ASM by @binaryheadache
Basic ASM by @binaryheadacheBasic ASM by @binaryheadache
Basic ASM by @binaryheadache
camsec
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
LogeekNightUkraine
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrencies
vpnmentor
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
vpnmentor
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingAntonio Maria Fiscarelli
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
Emanuel Calvo
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them allDEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
Felipe Prado
 

Similar to High Performance Systems Without Tears - Scala Days Berlin 2018 (20)

Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Basic ASM by @binaryheadache
Basic ASM by @binaryheadacheBasic ASM by @binaryheadache
Basic ASM by @binaryheadache
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrencies
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them allDEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
DEF CON 27- ITZIK KOTLER and AMIT KLEIN - gotta catch them all
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

High Performance Systems Without Tears - Scala Days Berlin 2018

  • 1. HIGH PERFORMANCE SYSTEMS WITHOUT TEARS ZAHARI DICHEV SCALA DAYS BERLIN 2018
  • 3. THINGS TO KEEP IN MIND ▸ Actual object instantiation costs time ▸ Creates a lot more work for the Garbage Collector ▸ May introduce systemic pauses in your application ▸ Introduces non-determinism
  • 4. THINGS TO KEEP IN MIND ▸ A fair amount of systems implement GC-free data structures living entirely off-heap ▸ Some commercials solutions provide pause-free GC implementations (Azul Zing) ▸ There are even proposals to introduce a No-Op GC in Java
  • 6. EXTRACTORS IN A NUTSHELL ▸ An object with an unapply method ▸ Takes an object and tries to give back its components ▸ Useful in pattern matching, partial functions, etc ▸ A great tool for achieving brevity and expressiveness
  • 7. EXTRACTING BLACK ROOKS case class Rook(x: Int, y: Int, isBlack: Boolean) ▸ We simply need to go through a board of rooks ▸ Match on all rooks that are black ▸ Extract the x and y coordinates
  • 8. EXTRACTING BLACK ROOKS object BlackRook { def unapply(rook: Rook): Option[(Int, Int)] = if (rook.isBlack) Some((rook.x, rook.y)) else None }
  • 9. EXTRACTING BLACK ROOKS public unapply(Lextractors/Rook;)Lscala/Option; L0 LINENUMBER 5 L0 ALOAD 1 INVOKEVIRTUAL extractors/Rook.isBlack ()Z IFEQ L1 NEW scala/Some DUP NEW scala/Tuple2$mcII$sp DUP . . . object BlackRook { def unapply(rook: Rook): Option[(Int, Int)] = if (rook.isBlack) Some((rook.x, rook.y)) else None }
  • 10. EXTRACTING BLACK ROOKS public unapply(Lextractors/Rook;)Lscala/Option; L0 LINENUMBER 5 L0 ALOAD 1 INVOKEVIRTUAL extractors/Rook.isBlack ()Z IFEQ L1 NEW scala/Some DUP NEW scala/Tuple2$mcII$sp DUP . . . object BlackRook { def unapply(rook: Rook): Option[(Int, Int)] = if (rook.isBlack) Some((rook.x, rook.y)) else None }
  • 11. ALLOCATION FREE EXTRACTORS ▸ Name-based extractors introduced in 2.11 ▸ Returning Option is no longer needed ▸ We need to return an object defining two methods def isEmpty: Boolean = . . . def get: T = . . .
  • 12. ALLOCATION-FREE EXAMPLE object BlackRookNameBased { class Extractor[T <: AnyRef](val extraction: T) extends AnyVal { def isEmpty: Boolean = extraction eq null def get: T = extraction } def unapply(rook: Rook): Extractor[(Int, Int)] = if (rook.isBlack) new Extractor((rook.x, rook.y)) else new Extractor(null) }
  • 13. ALLOCATION-FREE EXAMPLE (BYTECODE) public unapply(Lextractors/Rook;)Lscala/Tuple2; L0 ALOAD 1 INVOKEVIRTUAL extractors/Rook.isBlack ()Z IFEQ L1 L2 NEW scala/Tuple2$mcII$sp DUP ALOAD 1 INVOKEVIRTUAL extractors/Rook.x ()I ALOAD 1 INVOKEVIRTUAL extractors/Rook.y ()I INVOKESPECIAL scala/Tuple2$mcII$sp.<init> (II)V GOTO L3 L1 . . .
  • 14. ALLOCATION-FREE EXAMPLE (BYTECODE) public unapply(Lextractors/Rook;)Lscala/Tuple2; L0 ALOAD 1 INVOKEVIRTUAL extractors/Rook.isBlack ()Z IFEQ L1 L2 NEW scala/Tuple2$mcII$sp DUP ALOAD 1 INVOKEVIRTUAL extractors/Rook.x ()I ALOAD 1 INVOKEVIRTUAL extractors/Rook.y ()I INVOKESPECIAL scala/Tuple2$mcII$sp.<init> (II)V GOTO L3 L1 . . .
  • 15. EXECUTION TIME COMPARISON Time(ms) 0 100 200 300 400 Board Size 512 1024 2048 4096 369 120 47 24 144 44 2418 Name-based Default
  • 16. GC STATS ▸ -XX:+PrintGCApplicationStoppedTime 
 Enables printing of the duration of the pause (for example, a GC pause) that lasted. ▸ -XX:+PrintGCApplicationConcurrentTime
 Enables printing of the time elapsed from the last pause (for example, a GC pause). ▸ -XX:+PrintGCTimeStamps
 Enables printing of the duration of the pause (for example, a GC pause) that lasted.
  • 17. HEAP STATS (DEFAULT) num #instances #bytes class name ---------------------------------------------- 1: 262144 376704 scala.Tuple2$mcII$sp 2: 262144 6291456 extractors.Rook 3: 197796 3164736 java.lang.Integer 4: 11931 286344 java.lang.String 5: 17592 281472 scala.Some num #instances #bytes class name ---------------------------------------------- 1: 262144 376704 scala.Tuple2$mcII$sp 2: 262144 6291456 extractors.Rook 3: 197796 3164736 java.lang.Integer 4: 11949 286776 java.lang.String 5: 3108 174048 jdk.internal.org.objectweb.asm.Item HEAP STATS (NAME BASED)
  • 18. ALL MEMORY IS NOT CREATED EQUAL
  • 19. HOW LATENCY IS MASKED ▸ Modern CPUs have a multitude of caches ▸ These caches vary in size and latency ▸ Main purpose is to mask latency and ensure our CPU’s progress is not severely hindered by main memory latency ▸ A cache miss is one of the most prominent performance killers
  • 20. HIERARCHY OF CACHES ▸ L1 cache -  core-local cache split into separate 32K data and 32K instruction caches. (1.5 ns) ▸ L2 - core-local cache of 256K in size. Contains both data and instructions . (5 ns) ▸ L3 - Typically 6mb. Shared between cores. (16 - 25 ns) ▸ RAM - Large in size. (60 ns)
  • 21. MEMORY HIERARCHY EXAMPLE CORE L1 L2 CORE L1 L2 L3 MAIN MEMORY
  • 22. MATRIX TRANSPOSITION 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  • 23. MATRIX TRANSPOSITION 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED
  • 24. MATRIX TRANSPOSITION def transpose: Matrix = { val newMatrix = Matrix.empty(rows) for {i <- 0 until rows} for {j <- 0 until cols} newMatrix.data.update(j + I * rows, this.data(i + j * cols)) newMatrix }
  • 25. MATRIX TRANSPOSITION ▸ We have cache and memory ▸ Latency to memory is significantly higher ▸ We load data in cache lines of size 32 bytes (4 longs) ▸ Cache size is one line def transpose: Matrix = { val newMatrix = Matrix.empty(rows) for {i <- 0 until rows} for {j <- 0 until cols} newMatrix.data.update(j + I * rows, this.data(i + j * cols)) newMatrix }
  • 26. DATA ACCESSES 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED
  • 27. DATA ACCESSES 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED
  • 28. DATA ACCESSES 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED
  • 29. DATA ACCESSES 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED
  • 30. BETTER ACCESS PATTERN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
  • 31. BETTER ACCESS PATTERN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
  • 32. BETTER ACCESS PATTERN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
  • 33. BETTER ACCESS PATTERN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 TRANSPOSED 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
  • 34. EXECUTION TIME COMPARISONTime(ms) 0 1750 3500 5250 7000 Matrix size (side) 2048 4096 8192 16384 6 800 1 360 351120 3 100 822 238115 Cache Friendly Naive
  • 35. TOOLS TO MONITOR CACHE STATS ▸ Intel V tune 
 https://software.intel.com/en-us/intel-vtune-amplifier-xe ▸ Likwid
 https://github.com/RRZE-HPC/likwid ▸ Perf Stat
 https://perf.wiki.kernel.org ▸ Intel's PCM
 https://github.com/opcm/pcm
  • 36. INTEL OPEN PCM Core (SKT) | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP 0 0 8184 K 1993 K 0.30 0.43 0.01 0.01 35 1 0 1448 K 1982 K 0.27 0.24 0.01 0.01 35 2 0 4550 K 7100 K 0.36 0.46 0.00 0.00 33 3 0 1191 K 1658 K 0.28 0.34 0.00 0.01 33 4 0 9316 K 12 M 0.24 0.12 0.01 0.01 30 5 0 1042 K 1451 K 0.28 0.26 0.01 0.01 30 6 0 6700 K 9294 K 0.28 0.37 0.00 0.01 25 7 0 1013 K 1527 K 0.34 0.27 0.01 0.01 25 ——————————————————————————————————————————————————————————————————————————————————————
  • 37. INTEL OPEN PCM Core (SKT) | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP 0 0 8184 K 1993 K 0.30 0.43 0.01 0.01 35 1 0 1448 K 1982 K 0.27 0.24 0.01 0.01 35 2 0 4550 K 7100 K 0.36 0.46 0.00 0.00 33 3 0 1191 K 1658 K 0.28 0.34 0.00 0.01 33 4 0 9316 K 12 M 0.24 0.12 0.01 0.01 30 5 0 1042 K 1451 K 0.28 0.26 0.01 0.01 30 6 0 6700 K 9294 K 0.28 0.37 0.00 0.01 25 7 0 1013 K 1527 K 0.34 0.27 0.01 0.01 25 ——————————————————————————————————————————————————————————————————————————————————————
  • 38. L2 CACHE MISSES Cachemisees(L2) 0 150 300 450 600 Matrix size (side) 2048 4096 8192 16384 546 176 46 21 268 96 44 19 Cache Friendly Naive
  • 39. CACHE OBLIVIOUS ALGORITHMS ▸ A bit of a misleading name… ▸ An algorithm designed to take advantage of the underlying memory hierarchy ▸ No need to know details of the cache (size, length of the cache lines, etc.) ▸ Reduces the problem, so that it eventually fits in cache
  • 40. SOME KNOWN APPROACHES ▸ Matrix multiplication - Strassen algorithm ▸ Matrix tranposition - Frigo’s transpose ▸ Tree traversal - van Emde Boas layout ▸ Hashing - blocked probing
  • 42. EXAMPLE FROM THE DAY IN THE LIFE ▸ An event log ▸ Writer - writes events to the log in linear fashion ▸ Transformer - concurrently tails the log and transforms the events give some predefined function ▸ Transformer is never ahead of the writer
  • 43. A SIMPLE EVENT LOG trait EventLog[T] { def writeNext(ev: T): Boolean def transformNext(f: T => T): Boolean }
  • 44. A SIMPLE EVENT LOG var writerPos = 0L var transfPos = 0L val log = new Array[Int](logSize)
  • 45. A SIMPLE EVENT LOG def writeNext(ev: Int): Boolean = synchronized { if (writerPos < transfPos) { false } else { log(writerPos.toInt) = ev writerPos += 1 true } }
  • 46. A SIMPLE EVENT LOG def transformNext(f: Int Int): Boolean = synchronized { if (transfPos >= writerPos) { false } else { val currentEvent = log(transfPos.toInt) log(transfPos.toInt) = f(currentEvent) transfPos += 1 true } }
  • 47. DELETING ALL SENSITIVE DATA log.transformNext(_ => 0)
  • 49. SYNCHRONIZED STATS Log Type L2 Miss (M) L3 Miss (M) (M) IPC Ops/s (M) (million)Synchronized 1084 137 0.29 13.2 Lock free 357 156 0.42 17.2 Lazy set 304 73 0.8 42.6 Padded 211 50 1.4 76.5
  • 50. LOCK-FREE IMPLEMENTATION @volatile var writerPos = 0L @volatile var transfPos = 0L
  • 51. INSURING MEMORY VISIBILITY ▸ Introduces a happens-before relationship ▸ All changes prior to that have happened and are visible to other threads ▸ Does not mean that values are read from main memory ▸ Writes are applied to the L1 cache and flow through the cache subsystem
  • 52. LOCK-FREE STATS Log Type L2 Miss (M) L3 Miss (M) (M) IPC Ops/s (M) (million)Synchronized 1084 137 0.29 13.2 Lock-free 357 156 0.42 17.2 Lazy set 304 73 0.8 42.6 Padded 211 50 1.4 76.5
  • 53. GOING ATOMIC val writerPos = AtomicLong(0L) val transfPos = AtomicLong(0L) val log = new Array[Int](logSize) def writeNext(ev: Int): Boolean = { val currentWriterPos = writerPos.get if (currentWriterPos < transfPos.get) { false } else { log(currentWriterPos.toInt) = ev writerPos.lazySet(currentWriterPos + 1) true } }
  • 54. LAZY SET STATS Log Type L2 Miss (M) L3 Miss (M) (M) IPC Ops/s (M) (million)Synchronized 1084 137 0.29 13.2 Lock free 357 156 0.42 17.2 Lazy set 304 73 0.8 42.6 Padded 211 50 1.4 76.5
  • 55. FALSE SHARING ▸ Two threads modifying independent variables sharing the same cache line ▸ Often times depends on the layout of your objects ▸ Causes invalidation of cache lines and increased coherency protocol traffic ▸ Cache line is ping-pongs through L3 which has significant latency implications ▸ Can be even worse in case these threads are on a different socket (crossing interconnects)
  • 56. ENSURING COHERENCE (MESI) INVALID EXCLUSIVE SHARED MODIFIED BR + BW PR/S BW BW PR/~S BR PW BW PW BR PW PR + BR PR + PW PR PR - processor read PW - processor write BR - observed bus read BW - observed bus write S - shared ~S - not shared
  • 57. FALSE SHARING @volatile var writerPos = 0L @volatile var transfPos = 0L Cache Line 1 64 Bytes Cache Line 2 Cache Line 3 Cache Line N …
  • 58. FALSE SHARING @volatile var writerPos = 0L @volatile var transfPos = 0L Cache Line 1 64 Bytes Cache Line 2 Cache Line 3 Cache Line N … ▸ Inspecting Java Object Layout
 https://github.com/ktoso/sbt-jol
  • 59. PADDING TO AVOID FALSE SHARING val writerPos = AtomicLong.withPadding(0, LeftRight128) val transfPos = AtomicLong.withPadding(0, LeftRight128)
  • 60. PADDED STATS Log Type L2 Miss (M) L3 Miss (M) (M) IPC Ops/s (M) (million)Synchronized 1084 137 0.29 13.2 Lock free 357 156 0.42 17.2 Lazy set 304 73 0.8 42.6 Padded 211 50 1.4 76.5
  • 61. USE CASE FROM REAL LIFE
  • 62. AKKA MESSAGE LIFECYCLE Sender Sends a message through ActorRef ActorRef Actor Enqueues message Schedules and runs the mailbox Executor Service T1 T2 Tn Dispatcher
  • 63. TYPES OF AKKA DISPATCHERS ▸ Default Dispatcher - Used if no other specified ▸ Pinned Dispatcher - one thread per actor ▸ Calling Thread Dispatcher - for tests only
  • 64. TYPES OF EXECUTORS ▸ ForkJoinPool - Relies on lock free work stealing queues ▸ ThreadPoolExecutor - Uses Linked blocking queue to distribute tasks
  • 65. THREAD POOL EXECUTOR External Component Submits a task R1 R2 Rn T1 Tn T1 Dequeues and executes task
  • 66. LIMITATION: NO ACTOR-TO-THREAD AFFINITY ▸ Potentially causing CPU cache invalidation ▸ Lack of parameters allowing you to achieve fine grained control
  • 67. AFFINITY POOL External Component Submits a task R1 R2 Rn T1 R1 R2 Rn T2 R1 R2 Rn Tn Queue selector Picks the queue to submit to Dequeue tasks
  • 68. FAIR DISTRIBUTION QUEUE SELECTOR ▸ Adaptive work assignment strategy ▸ Few actors - explicit mapping (fairer) ▸ More actors - consistent hashing (cheaper)
  • 69. ADVANTAGES ▸ Less cache hits due to temporal locality ▸ Decreases contention ▸ Customisable queue selection
  • 70. CLIENT ACTOR class UserQueryActor(latch: CountDownLatch, numQueries: Int, numUsersInDB: Int) extends Actor { private var left = numQueries private val receivedUsers: mutable.Map[Int, User] = mutable.Map() private val randGenerator = new Random() override def receive: Receive = { case u: User { receivedUsers.put(u.userId, u) if (left == 0) { latch.countDown() context stop self } else { sender() ! Request(randGenerator.nextInt(numUsersInDB)) } left -= 1 } } }
  • 71. SERVICE ACTOR class UserServiceActor(userDb: Map[Int, User], latch: CountDownLatch, numQueries: Int) extends Actor { private var left = numQueries def receive = { case Request(id) userDb.get(id) match { case Some(u) sender() ! u case None } if (left == 0) { latch.countDown() context stop self } left -= 1 } }
  • 72. BENCHMARK RESULTSMsg/s(M) 0 1 2 3 4 5 6 dispatcher.throughput 1 5 50 5,2 3,3 1,4 5 4,4 3,7 5,4 4,84,7 Affinity Fork Join Fixed Size
  • 73. SO… IF YOU ARE ON A HOT CODEPATH ▸ Make sure you really are ▸ Measure everything (sbt-jmh, sbt-jol, perfstat …) ▸ Watch out for language features that can introduce unintended allocations (e.g. pattern matching) ▸ Use algorithms and data structures that are cache friendly ▸ Use efficient concurrency tools but try to not roll your own: JCTools, Akka, Vert.x …
  • 74. RESOURCES ▸ Name based extractors
 https://hseeberger.wordpress.com/2013/10/04/name-based-extractors-in-scala-2-11/ ▸ Cache oblivious algorithms (MIT OCW)
 https://www.youtube.com/watch?v=CSqbjfCCLrU ▸ Lazy Set in detail
 http://psy-lob-saw.blogspot.bg/2012/12/atomiclazyset-is-performance-win-for.html ▸ Processor Counter Monitor
 https://github.com/opcm/pcm ▸ Memory Access Patterns
 https://mechanical-sympathy.blogspot.bg/2012/08/memory-access-patterns-are- important.html ▸ False Sharing
 https://mechanical-sympathy.blogspot.bg/2011/07/false-sharing.html
 https://mechanical-sympathy.blogspot.bg/2013/02/cpu-cache-flushing-fallacy.html ▸ Slides and code
 https://github.com/zaharidichev/scala-days-2018-berlin