Konrad `@ktosopl` Malawski
need for async
hot pursuit
for Scalable apps
ScalaWorld 2015
Konrad `ktoso` Malawski
Akka Team,
Reactive Streams TCK
(we’re renaming soon!)
Konrad `@ktosopl` Malawski
akka.io
typesafe.com
geecon.org
Java.pl / KrakowScala.pl
sckrk.com / meetup.com/Paper-Cup @ London
GDGKrakow.pl
lambdakrk.pl
(we’re renaming soon!)
Nice to meet you!
Who are you guys?
High Performance Software Development
For the majority of the time,

high performance software development 

is not about compiler hacks and bit twiddling. 



It is about fundamental design principles that are 

key to doing any effective software development.
Martin Thompson
practicalperformanceanalyst.com/2015/02/17/getting-to-know-martin-thompson-...
Agenda
• Why?
• Async and Synch basics / definitions
• Async where it matters: Scheduling
• How NOT to measure Latency
• Concurrent < lock-free < wait-free
• I/O: IO, AIO, NIO, Zero
• Sorting a.k.a. hiding the horrible mutable stuff
• Distributed Systems: Where Async is at Home
• Wrapping up and Q/A
Why?
Why this talk?
“The free lunch is over”
Herb Sutter
(A Fundamental Turn Toward Concurrency in Software)
Why this talk?
“The free lunch is over”
Herb Sutter
(A Fundamental Turn Toward Concurrency in Software)
How and why
Asynchronous Processing
comes into play for
Scalability and Performance.
Why this talk?
“The free lunch is over”
Herb Sutter
(A Fundamental Turn Toward Concurrency in Software)
How and why
Asynchronous Processing
comes into play for
Scalability and Performance.
I want you guys make
well informed decisions,
not buzzword driven development.
Sync / Async Basics
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Sync / Async
Highly parallel systems
Event loops
Actors
Async where it matters:
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Async where it matters: Scheduling
Scheduling (notice they grey sync call)
Scheduling (notice they grey sync call)
Scheduling (now with Async db call)
Scheduling (now with Async db call)
Scheduling (now with Async db call)
Scheduling (now with Async db call)
Lesson learned:
Blocking is indistinguishable from being very slow.
Latency
Latency Quiz #1: Which one is Latency?
By the queueing theory definitions:
Latency Quiz #1: Which one is Latency?
By the queueing theory definitions:
Latency Quiz #1: Which one is Latency?
By the queueing theory definitions:
Latency Quiz #1: Which one is Latency?
By the queueing theory definitions:
Latency Quiz #1: Which one is Latency?
Latency Quiz #1: Which one is Latency?
Sometimes devs use Latency and Response Time as the same thing.
That’s OK, as long as both sides know which one they are talking about.
Latency Quiz #2
Gil Tene style, see:“How NOT to Measure Latency”
Is 10s latency acceptable in your app?
Is 200ms latency acceptable?
How about most responses within 200ms?
So mostly 20ms and some 1 minute latencies is OK?
Do people die when we go above 200ms?
So 90% below 200ms, 99% bellow 1s, 99.99% below 2s?
Latency in the “real world”
Gil Tene style, see:“How NOT to Measure Latency”
“Our response time is 200ms average,
stddev is around 60ms”
— a typical quote
Latency in the “real world”
Gil Tene style, see:“How NOT to Measure Latency”
“Our response time is 200ms average,
stddev is around 60ms”
— a typical quote
Latency does NOT behave like normal distribution!
“So yeah, our 99,99%’ is…”
http://hdrhistogram.github.io/HdrHistogram/
Hiccups
Gil Tene style, see:“How NOT to Measure Latency”
Hiccups
Gil Tene style, see:“How NOT to Measure Latency”
Hiccups
Gil Tene style, see:“How NOT to Measure Latency”
Hiccups
Gil Tene style, see:“How NOT to Measure Latency”
Hiccups
Gil Tene style, see:“How NOT to Measure Latency”
Hiccups
Lesson learned:
Use precise language when talking about latencies.
Measure the right thing!
“Trust no-one, bench everything!”
Verify the results!
Ask on groups or forums.
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
“concurrent” data structure
Concurrent < lock-free < wait-free
What can happen in concurrent data structures:
A tries to write; B tries to write; B wins!
A tries to write; C tries to write; C wins!
A tries to write; D tries to write; D wins!
A tries to write; B tries to write; B wins!
A tries to write; E tries to write; E wins!
A tries to write; F tries to write; F wins!
…
Moral?
1) Thread A is a complete loser.
2) Thread A may never make progress.
What can happen in concurrent data structures:
A tries to write; B tries to write; B wins!
C tries to write; C wins!
D tries to write; D wins!
B tries to write; B wins!
E tries to write; E wins!
F tries to write; F wins!
…
Moral?
1) Thread A is a complete loser.
2) Thread A may never make progress.
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-less
Remember: Concurrency is NOT Parallelism.
Rob Pike - Concurrency is NOT Parallelism (video)
def offer(a: A): Boolean // returns on failure


def add(a: A): Unit // throws on failure

def put(a: A): Boolean // blocks until able to enqueue
Concurrent < lock-free < wait-free
concurrent data structure
<
lock-free* data structure
* lock-free a.k.a. lockless
What lock-free programming looks like:
An algorithm is lock-free if it satisfies that:
When the program threads are run sufficiently long,
at least one of the threads makes progress.
Concurrent < lock-free < wait-free
* Both versions are used: lock-free / lockless
class CASBackedQueue[A] {
val _queue = new AtomicReference(Vector[A]())
// may “spin”
@tailrec final def put(a: A): Unit = {
val queue = _queue.get
val appended = queue :+ a
if (!_queue.compareAndSet(queue, appended))
put(a)
}
}
Concurrent < lock-free < wait-free
class CASBackedQueue[A] {
val _queue = new AtomicReference(Vector[A]())
// may “spin”
@tailrec final def put(a: A): Unit = {
val queue = _queue.get
val appended = queue :+ a
if (!_queue.compareAndSet(queue, appended))
put(a)
}
}
Concurrent < lock-free < wait-free
class CASBackedQueue[A] {
val _queue = new AtomicReference(Vector[A]())
// may “spin”
@tailrec final def put(a: A): Unit = {
val queue = _queue.get
val appended = queue :+ a
if (!_queue.compareAndSet(queue, appended))
put(a)
}
}
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
Concurrent < lock-free < wait-free
“concurrent” data structure
<
lock-free* data structure
<
wait-free data structure
* Both versions are used: lock-free / lockless
Concurrent < lock-free < wait-free
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

Maged M. Michael Michael L. Scott
An algorithm is wait-free if every operation has a bound on
the number of steps the algorithm will take before the
operation completes.
wait-free: j.u.c.ConcurrentLinkedQueue
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

Maged M. Michael Michael L. Scott
public boolean offer(E e) {
checkNotNull(e);
final Node<E> newNode = new Node<E>(e);
for (Node<E> t = tail, p = t;;) {
Node<E> q = p.next;
if (q == null) {
// p is last node
if (p.casNext(null, newNode)) {
// Successful CAS is the linearization point
// for e to become an element of this queue,
// and for newNode to become "live".
if (p != t) // hop two nodes at a time
casTail(t, newNode); // Failure is OK.
return true;
}
// Lost CAS race to another thread; re-read next
}
else if (p == q)
// We have fallen off list. If tail is unchanged, it
// will also be off-list, in which case we need to
// jump to head, from which all live nodes are always
// reachable. Else the new tail is a better bet.
p = (t != (t = tail)) ? t : head;
else
// Check for tail updates after two hops.
p = (p != t && t != (t = tail)) ? t : q;
}
}
This is a modification of the Michael & Scott algorithm,
adapted for a garbage-collected environment, with support
for interior node deletion (to support remove(Object)).


For explanation, read the paper.
Act-as-if referentially transparent
Act-as-if referentially transparent
trait Cache[T] {
def mut(t: T): Boolean
def observe(): List[T]
}
Act-as-if referentially transparent
class Acc[T] extends Cache[T] {
type Observed = Boolean
private val _acc = new AtomicReference[(Observed, List[T])]() // could be Array based
@tailrec final def mut(t: T): Boolean = {
_acc.get match {
case (true, acc) =>
false // no mutation allowed anymore!
case v @ (false, acc) =>
_acc.compareAndSet(v, (true, t :: acc)) || mut(t)
}
}
@tailrec final def observe(): List[T] = {
_acc.get match {
case (true, acc) =>
acc
case v @ (_, acc) =>
if (!_acc.compareAndSet(v, v.copy(_1 = true))) observe()
else acc
}
}
}
I vaguely remember Daniel mentioning this technique some time ago?
Lesson learned:
Different ways to construct concurrent data structures.
Pick the right impl for the right job.
I / O
IO / AIO
IO / AIO / NIO
IO / AIO / NIO / Zero
Synchronous I / O
— Havoc Pennington
(HAL, GNOME, GConf, D-BUS, now Typesafe)
When I learned J2EE about 2008 with some of my
desktop colleagues our reactions included something like:



”wtf is this sync IO crap, 

where is the main loop?!” :-)
Interruption!
CPU: User Mode / Kernel Mode
Kernels and CPUs
Kernels and CPUs
Kernels and CPUs
Kernels and CPUs
Kernels and CPUs
http://wiki.osdev.org/Context_Switching
Kernels and CPUs
[…] switching from user-level to kernel-level 

on a (2.8 GHz) P4 is 1348 cycles. 



[…] Counting actual time, the P4 takes 481ns […]
http://wiki.osdev.org/Context_Switching
I / O
I / O
I / O
I / O
I / O
I / O
“Don’t worry.
It only gets worse!”
I / O
“Don’t worry.
It only gets worse!”
Same data in 3 buffers!4 mode switches!
Asynchronous I / O [Linux]
Linux AIO = JVM NIO
Asynchronous I / O [Linux]
NewIO… since 2004!
(No-one calls it “new” any more)
Linux AIO = JVM NIO
Asynchronous I / O [Linux]
Less time wasted waiting.
Same amount of buffer copies.
ZeroCopy = sendfile [Linux]
“Work smarter.
Not harder.”
http://fourhourworkweek.com/
ZeroCopy = sendfile [Linux]
ZeroCopy = sendfile [Linux]
Data never leaves kernel mode!
Lesson learned:
By picking the right tool,
you can avoid wasting CPU time.
Asynchronous IO will not beat a synchronous seek
throughput-wise but it can allow higher scalability.
Functional outside
Functional outside
Functional outside, Intel inside: Sorting
Jan Pustelnik’s Benchmarks: https://github.com/gosubpl
• Haskell Functional – http://en.literateprograms.org
• Haskell Imperative – http://stackoverflow.com/questions/11481675/using-
vectors-for-performance-improvement-in-haskell
• C++ – by-the-book impl, rather simple, plenty examples in books
• Scala – Scala by Example; and ST monadic example from https://
github.com/fpinscala/fpinscala/blob/master/answers/src/main/scala/fpinscala/
localeffects/LocalEffects.scala (Solutions to Functional programming in Scala
by Paul Chiusano and Runar Bjarnason).
Jan Pustelnik’s Benchmarks: https://github.com/gosubpl
Functional outside, Intel inside: Sorting
Jan Pustelnik’s Benchmarks: https://github.com/gosubpl
Functional outside, Intel inside: Sorting
Jan Pustelnik’s Benchmarks: https://github.com/gosubpl
Functional outside, Intel inside: Sorting
Jan Pustelnik’s Benchmarks: https://github.com/gosubpl
Functional outside, Intel inside: Sorting
Scala 2.10.5 –Vector#sorted
def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
val len = this.length
val arr = new ArraySeq[A](len)
var i = 0
for (x <- this.seq) {
arr(i) = x
i += 1
}
java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]])
val b = newBuilder
b.sizeHint(len)
for (x <- arr) b += x
b.result
}
Functional outside, Intel inside: Sorting
MicroBenchmarking done right
http://openjdk.java.net/projects/code-tools/jmh/github.com/ktoso/sbt-jmh
Where are we spending time?
What and how much are we allocating?
…?
MicroBenchmarking DON’T DO THIS:
https://github.com/travisbrown/circe/issues/59
MicroBenchmarking DON’T DO THIS:
https://github.com/travisbrown/circe/issues/59
MicroBenchmarking done right
http://openjdk.java.net/projects/code-tools/jmh/github.com/ktoso/sbt-jmh
OpenJDK JMH
JMH – perf_events integration
http://openjdk.java.net/projects/code-tools/jmh/github.com/ktoso/sbt-jmh
Perf stats:
--------------------------------------------------
4172.776137 task-clock (msec) # 0.411 CPUs utilized
612 context-switches # 0.147 K/sec
31 cpu-migrations # 0.007 K/sec
195 page-faults # 0.047 K/sec
16,599,643,026 cycles # 3.978 GHz [30.80%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
17,815,084,879 instructions # 1.07 insns per cycle [38.49%]
3,813,373,583 branches # 913.870 M/sec [38.56%]
1,212,788 branch-misses # 0.03% of all branches [38.91%]
7,582,256,427 L1-dcache-loads # 1817.077 M/sec [39.07%]
312,913 L1-dcache-load-misses # 0.00% of all L1-dcache hits [38.66%]
35,688 LLC-loads # 0.009 M/sec [32.58%]
<not supported> LLC-load-misses:HG
<not supported> L1-icache-loads:HG
161,436 L1-icache-load-misses:HG # 0.00% of all L1-icache hits [32.81%]
7,200,981,198 dTLB-loads:HG # 1725.705 M/sec [32.68%]
3,360 dTLB-load-misses:HG # 0.00% of all dTLB cache hits [32.65%]
193,874 iTLB-loads:HG # 0.046 M/sec [32.56%]
4,193 iTLB-load-misses:HG # 2.16% of all iTLB cache hits [32.44%]
<not supported> L1-dcache-prefetches:HG
0 L1-dcache-prefetch-misses:HG # 0.000 K/sec [32.33%]
10.159432892 seconds time elapsed
JMH – perfasm integration
Nitsan Wakart’s blog:

http://psy-lob-saw.blogspot.co.uk/2015/07/jmh-perfasm.html
Other tools: Flame Graphs
https://github.com/brendangregg/FlameGraph
https://github.com/jrudolph/perf-map-agent
Immutable DSL / AST outside
Immutable DSL / AST outside
streams engine inside
Immutable DSL + Akka engine
We happen to have:
a pretty performant engine…
+
constrained domain
=
Immutable DSL + Akka engine
We happen to have:
a pretty performant engine…
+
constrained domain
=
Immutable DSL + Akka engine
implicit val mat = ActorMaterializer()
val fives = Source.repeat(5)
val timesTwo = Flow[Int].map(_ * 2)
val intToString = Flow[Int].map(_.toString)
val transform = timesTwo via intToString
val sysOut = Sink.foreach(println)
val r = fives via transform.take(10) to sysOut


r.run() // uses implicit Materializer
implicit val mat = ActorMaterializer()
val fives = Source.repeat(5)
val timesTwo = Flow[Int].map(_ * 2)
val intToString = Flow[Int].map(_.toString)
val transform = timesTwo via intToString
val sysOut = Sink.foreach(println)
val r = fives via transform.take(10) to sysOut


r.run() // uses implicit Materializer
Immutable DSL + Akka engine
implicit val mat = ActorMaterializer()
val fives = Source.repeat(5)
val timesTwo = Flow[Int].map(_ * 2)
val intToString = Flow[Int].map(_.toString)
val transform = timesTwo via intToString
val sysOut = Sink.foreach(println)
val r = fives via transform.take(10) to sysOut


r.run() // uses implicit Materializer
Immutable DSL + Akka engine
Decoupled “what?” from “how?”
What?
val g =
val one: Source[Int, Unit] = Source.single(1)
val two: Source[Int, Promise[Unit]] = Source.lazyEmpty
val three: Source[Int, Unit] = Source.single(3)
Decoupled “what?” from “how?”
What?
val g =
val one: Source[Int, Unit] = Source.single(1)
val two: Source[Int, Promise[Unit]] = Source.lazyEmpty
val three: Source[Int, Unit] = Source.single(3)
val step1: Source[Int, Promise[Unit]] =
Source(one, two, three)(combineMat = (m1, m2, m3) => m2) {
implicit b => (on, tw, th) =>
val m = b.add(Merge[Int](3))
on ~> m.in(0)
tw ~> m.in(1)
th ~> m.in(2)
m.out
}
Decoupled “what?” from “how?”
What?
val g =
val one: Source[Int, Unit] = Source.single(1)
val two: Source[Int, Promise[Unit]] = Source.lazyEmpty
val three: Source[Int, Unit] = Source.single(3)
val step1: Source[Int, Promise[Unit]] =
Source(one, two, three)(combineMat = (m1, m2, m3) => m2) {
implicit b => (on, tw, th) =>
val m = b.add(Merge[Int](3))
on ~> m.in(0)
tw ~> m.in(1)
th ~> m.in(2)
m.out
}
Decoupled “what?” from “how?”
What?
val g =
val step2 = Flow() { implicit b =>
val broadcast = b.add(Broadcast[Int](2, eagerCancel = true))
val merge = b.add(Merge[Int](2))
broadcast.out(0) ~> Flow[Int].map(_ + 2) ~> merge.in(0)
broadcast.out(1) ~> Flow[Int].map(_ - 2) ~> merge.in(1)
(broadcast.in, merge.out)
}
Decoupled “what?” from “how?”
What?
val g =
How?
g.run()(materializer)
Decoupled “what?” from “how?”
What?
val g =
How?
g.run()(materializer)
Lifted representation => inspection + optimisation
Lesson learned:
Being able to drop down to the “mutable-stuff “
is a huge benefit of Scala,
and is very useful in very specific places!
APIs can stay pure and immutable!
Distributed Systems
Distributed Systems
“… in which the failure of a computer
you didn't even know existed can
render your own computer unusable.”
— Leslie Lamport
http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt
Distributed Systems
The bigger the system,
the more “random” latency / failure noise.
Embrace instead of hiding it.
Distributed Systems
Backup Requests
Backup requests
A technique for fighting “long tail latencies”.
By issuing duplicated work, when SLA seems in danger.
Backup requests
Backup requests
Backup requests - send
Backup requests - send
Backup requests - send
Backup requests
Avg Std dev 95%ile 99%ile 99.9%ile
No backups 33 ms 1524 ms 24 ms 52 ms 994 ms
After 10ms 14 ms 4 ms 20 ms 23 ms 50 ms
After 50ms 16 ms 12 ms 57 ms 63 ms 68 ms
Jeff Dean - Achieving Rapid Response Times in Large Online Services
Peter Bailis - Doing Redundant Work to Speed Up Distributed Queries
Akka - Krzysztof Janosz @ Akkathon, Kraków - TailChoppingRouter (docs, pr)
Lesson learned:
Backup requests allow
trade-off increased load
for decreased latency.
Distributed Systems
Combined Requests
&
Back-pressure
Combined requests & back-pressure
Combined requests & back-pressure
No no no…!
Not THAT Back-pressure!
Combined requests & back-pressure
THAT kind of back-pressure:
Combined requests & back-pressure
THAT kind of back-pressure:
www.reactive-streams.org
Combined requests & back-pressure
THAT kind of back-pressure:
www.reactive-streams.org
Combined requests & back-pressure
THAT kind of back-pressure:
www.reactive-streams.org
Combined requests
A technique for avoiding duplicated work.
By aggregating requests, possibly increasing latency.
“Wat?Why would I increase latency!?”
Combined requests
Combined requests
Combined requests
Combined requests
Lesson learned:
Back-pressure saves systems from overload.
Combined requests trade higher latency,
for less work for the downstream.
Wrapping up
Wrapping up
You don’t need to be a great mechanic
to be a great racing driver,
but you must understand how your bolid works!
~ Martin Thompson (in context of Mechanical Sympathy)
Wrapping up
• Someone has to bite the bullet though!
• We’re all running on real hardware.
• Libraries do it so you don’t have to - pick the right one!
Be aware that:
Wrapping up
• Keep your apps pure
• Be aware of internals
• Async all the things!
• Messaging all the way!
• Someone has to bite the bullet though!
• We’re all running on real hardware.
• Libraries do it so you don’t have to - pick the right one!
Be aware that:
Links
• akka.io
• reactive-streams.org
• akka-user

• Gil Tene - How NOT to measure latency, 2013
• Jeff Dean @Velocity 2014
• Alan Bateman, Jeanfrancois Arcand (Sun) Async IO Tips @ JavaOne
• http://linux.die.net/man/2/select
• http://linux.die.net/man/2/poll
• http://linux.die.net/man/4/epoll
• giltene/jHiccup
• Linux Journal: ZeroCopy I, Dragan Stancevis 2013
• Last slide car picture: http://actu-moteurs.com/sprint/gt-tour/jean-
philippe-belloc-un-beau-challenge-avec-le-akka-asp-team/2000
Links
• http://wiki.osdev.org/Context_Switching
• CppCon: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades)"
• http://www.infoq.com/presentations/reactive-services-scale
• Gil Tene’s HdrHistogram.org
• http://hdrhistogram.github.io/HdrHistogram/plotFiles.html
• Rob Pike - Concurrency is NOT Parallelism (video)
• Brendan Gregg - Systems Performance: Enterprise and the Cloud (book)
• http://psy-lob-saw.blogspot.com/2015/02/hdrhistogram-better-latency-capture.html
• Jeff Dean, Luiz Andre Barroso - The Tail at Scale (whitepaper,ACM)
• http://highscalability.com/blog/2012/3/12/google-taming-the-long-latency-tail-when-
more-machines-equal.html
• http://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-
architects/
• Marcus Lagergren - Oracle JRockit:The Definitive Guide (book)
• http://mechanical-sympathy.blogspot.com/2013/08/lock-based-vs-lock-free-
concurrent.html
• Handling of Asynchronous Events - http://www.win.tue.nl/~aeb/linux/lk/lk-12.html
• http://www.kegel.com/c10k.html
Links
• www.reactivemanifesto.org/
• Seriously the only right way to micro benchmark on the JVM:
• JMH openjdk.java.net/projects/code-tools/jmh/
• JMH for Scala: https://github.com/ktoso/sbt-jmh
• http://www.ibm.com/developerworks/library/l-async/
• http://lse.sourceforge.net/io/aio.html
• https://code.google.com/p/kernel/wiki/AIOUserGuide
• ShmooCon: C10M - Defending the Internet At Scale (Robert Graham)
• http://blog.erratasec.com/2013/02/scalability-its-question-that-drives-us.html#.VO6E11PF8SM
•User-level threads....... with threads. - Paul Turner @ Linux Plumbers Conf 2013
•https://www.facebook.com/themainstreetpiggyboys/photos/a.
1390784047896753.1073741829.1390594984582326/1423592294615928/?
type=1&theater for the Rollin’ Cuy on last slide
Special thanks to:
• Aleksey Shipilëv
• Andrzej Grzesik
• Gil Tene
• Kirk Pepperdine
• Łukasz Dubiel
• Marcus Lagergren
• Martin Thompson
• Mateusz Dymczyk
• Nitsan Wakart
Thanks guys, you’re awesome.
alphabetically, mostly
• Peter Lawrey
• Richard Warburton
• Roland Kuhn
• Runar Bjarnason
• Sergey Kuksenko
• Steve Poole
• Viktor Klang a.k.a. √
• Antoine de Saint Exupéry;-)
• and the entire AkkaTeam
• the Mechanical Sympathy Mailing List
Thanks & Q/A!
ktoso @ typesafe.com
twitter: ktosopl
github: ktoso
team blog: letitcrash.com
home: akka.io
©Typesafe 2015 – All Rights Reserved

The Need for Async @ ScalaWorld

  • 1.
    Konrad `@ktosopl` Malawski needfor async hot pursuit for Scalable apps ScalaWorld 2015
  • 2.
    Konrad `ktoso` Malawski AkkaTeam, Reactive Streams TCK (we’re renaming soon!)
  • 3.
    Konrad `@ktosopl` Malawski akka.io typesafe.com geecon.org Java.pl/ KrakowScala.pl sckrk.com / meetup.com/Paper-Cup @ London GDGKrakow.pl lambdakrk.pl (we’re renaming soon!)
  • 4.
    Nice to meetyou! Who are you guys?
  • 5.
    High Performance SoftwareDevelopment For the majority of the time,
 high performance software development 
 is not about compiler hacks and bit twiddling. 
 
 It is about fundamental design principles that are 
 key to doing any effective software development. Martin Thompson practicalperformanceanalyst.com/2015/02/17/getting-to-know-martin-thompson-...
  • 6.
    Agenda • Why? • Asyncand Synch basics / definitions • Async where it matters: Scheduling • How NOT to measure Latency • Concurrent < lock-free < wait-free • I/O: IO, AIO, NIO, Zero • Sorting a.k.a. hiding the horrible mutable stuff • Distributed Systems: Where Async is at Home • Wrapping up and Q/A
  • 7.
  • 8.
    Why this talk? “Thefree lunch is over” Herb Sutter (A Fundamental Turn Toward Concurrency in Software)
  • 9.
    Why this talk? “Thefree lunch is over” Herb Sutter (A Fundamental Turn Toward Concurrency in Software) How and why Asynchronous Processing comes into play for Scalability and Performance.
  • 10.
    Why this talk? “Thefree lunch is over” Herb Sutter (A Fundamental Turn Toward Concurrency in Software) How and why Asynchronous Processing comes into play for Scalability and Performance. I want you guys make well informed decisions, not buzzword driven development.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Highly parallel systems Eventloops Actors Async where it matters:
  • 22.
    Async where itmatters: Scheduling
  • 23.
    Async where itmatters: Scheduling
  • 24.
    Async where itmatters: Scheduling
  • 25.
    Async where itmatters: Scheduling
  • 26.
    Async where itmatters: Scheduling
  • 27.
    Async where itmatters: Scheduling
  • 28.
    Async where itmatters: Scheduling
  • 29.
    Async where itmatters: Scheduling
  • 30.
    Async where itmatters: Scheduling
  • 31.
    Scheduling (notice theygrey sync call)
  • 32.
    Scheduling (notice theygrey sync call)
  • 33.
    Scheduling (now withAsync db call)
  • 34.
    Scheduling (now withAsync db call)
  • 35.
    Scheduling (now withAsync db call)
  • 36.
    Scheduling (now withAsync db call)
  • 37.
    Lesson learned: Blocking isindistinguishable from being very slow.
  • 38.
  • 39.
    Latency Quiz #1:Which one is Latency? By the queueing theory definitions:
  • 40.
    Latency Quiz #1:Which one is Latency? By the queueing theory definitions:
  • 41.
    Latency Quiz #1:Which one is Latency? By the queueing theory definitions:
  • 42.
    Latency Quiz #1:Which one is Latency? By the queueing theory definitions:
  • 43.
    Latency Quiz #1:Which one is Latency?
  • 44.
    Latency Quiz #1:Which one is Latency? Sometimes devs use Latency and Response Time as the same thing. That’s OK, as long as both sides know which one they are talking about.
  • 45.
    Latency Quiz #2 GilTene style, see:“How NOT to Measure Latency” Is 10s latency acceptable in your app? Is 200ms latency acceptable? How about most responses within 200ms? So mostly 20ms and some 1 minute latencies is OK? Do people die when we go above 200ms? So 90% below 200ms, 99% bellow 1s, 99.99% below 2s?
  • 46.
    Latency in the“real world” Gil Tene style, see:“How NOT to Measure Latency” “Our response time is 200ms average, stddev is around 60ms” — a typical quote
  • 47.
    Latency in the“real world” Gil Tene style, see:“How NOT to Measure Latency” “Our response time is 200ms average, stddev is around 60ms” — a typical quote Latency does NOT behave like normal distribution! “So yeah, our 99,99%’ is…”
  • 48.
  • 49.
    Gil Tene style,see:“How NOT to Measure Latency” Hiccups
  • 50.
    Gil Tene style,see:“How NOT to Measure Latency” Hiccups
  • 51.
    Gil Tene style,see:“How NOT to Measure Latency” Hiccups
  • 52.
    Gil Tene style,see:“How NOT to Measure Latency” Hiccups
  • 53.
    Gil Tene style,see:“How NOT to Measure Latency” Hiccups
  • 54.
    Lesson learned: Use preciselanguage when talking about latencies. Measure the right thing! “Trust no-one, bench everything!” Verify the results! Ask on groups or forums.
  • 55.
  • 56.
  • 57.
  • 58.
    Concurrent < lock-free< wait-free “concurrent” data structure
  • 59.
    Concurrent < lock-free< wait-free What can happen in concurrent data structures: A tries to write; B tries to write; B wins! A tries to write; C tries to write; C wins! A tries to write; D tries to write; D wins! A tries to write; B tries to write; B wins! A tries to write; E tries to write; E wins! A tries to write; F tries to write; F wins! … Moral? 1) Thread A is a complete loser. 2) Thread A may never make progress.
  • 60.
    What can happenin concurrent data structures: A tries to write; B tries to write; B wins! C tries to write; C wins! D tries to write; D wins! B tries to write; B wins! E tries to write; E wins! F tries to write; F wins! … Moral? 1) Thread A is a complete loser. 2) Thread A may never make progress. Concurrent < lock-free < wait-free
  • 61.
    Concurrent < lock-free< wait-less Remember: Concurrency is NOT Parallelism. Rob Pike - Concurrency is NOT Parallelism (video) def offer(a: A): Boolean // returns on failure 
 def add(a: A): Unit // throws on failure
 def put(a: A): Boolean // blocks until able to enqueue
  • 62.
    Concurrent < lock-free< wait-free concurrent data structure < lock-free* data structure * lock-free a.k.a. lockless
  • 63.
  • 64.
    An algorithm islock-free if it satisfies that: When the program threads are run sufficiently long, at least one of the threads makes progress. Concurrent < lock-free < wait-free
  • 65.
    * Both versionsare used: lock-free / lockless class CASBackedQueue[A] { val _queue = new AtomicReference(Vector[A]()) // may “spin” @tailrec final def put(a: A): Unit = { val queue = _queue.get val appended = queue :+ a if (!_queue.compareAndSet(queue, appended)) put(a) } } Concurrent < lock-free < wait-free
  • 66.
    class CASBackedQueue[A] { val_queue = new AtomicReference(Vector[A]()) // may “spin” @tailrec final def put(a: A): Unit = { val queue = _queue.get val appended = queue :+ a if (!_queue.compareAndSet(queue, appended)) put(a) } } Concurrent < lock-free < wait-free
  • 67.
    class CASBackedQueue[A] { val_queue = new AtomicReference(Vector[A]()) // may “spin” @tailrec final def put(a: A): Unit = { val queue = _queue.get val appended = queue :+ a if (!_queue.compareAndSet(queue, appended)) put(a) } } Concurrent < lock-free < wait-free
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
    Concurrent < lock-free< wait-free “concurrent” data structure < lock-free* data structure < wait-free data structure * Both versions are used: lock-free / lockless
  • 75.
    Concurrent < lock-free< wait-free Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
 Maged M. Michael Michael L. Scott An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.
  • 76.
    wait-free: j.u.c.ConcurrentLinkedQueue Simple, Fast,and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
 Maged M. Michael Michael L. Scott public boolean offer(E e) { checkNotNull(e); final Node<E> newNode = new Node<E>(e); for (Node<E> t = tail, p = t;;) { Node<E> q = p.next; if (q == null) { // p is last node if (p.casNext(null, newNode)) { // Successful CAS is the linearization point // for e to become an element of this queue, // and for newNode to become "live". if (p != t) // hop two nodes at a time casTail(t, newNode); // Failure is OK. return true; } // Lost CAS race to another thread; re-read next } else if (p == q) // We have fallen off list. If tail is unchanged, it // will also be off-list, in which case we need to // jump to head, from which all live nodes are always // reachable. Else the new tail is a better bet. p = (t != (t = tail)) ? t : head; else // Check for tail updates after two hops. p = (p != t && t != (t = tail)) ? t : q; } } This is a modification of the Michael & Scott algorithm, adapted for a garbage-collected environment, with support for interior node deletion (to support remove(Object)). 
 For explanation, read the paper.
  • 77.
  • 78.
    Act-as-if referentially transparent traitCache[T] { def mut(t: T): Boolean def observe(): List[T] }
  • 79.
    Act-as-if referentially transparent classAcc[T] extends Cache[T] { type Observed = Boolean private val _acc = new AtomicReference[(Observed, List[T])]() // could be Array based @tailrec final def mut(t: T): Boolean = { _acc.get match { case (true, acc) => false // no mutation allowed anymore! case v @ (false, acc) => _acc.compareAndSet(v, (true, t :: acc)) || mut(t) } } @tailrec final def observe(): List[T] = { _acc.get match { case (true, acc) => acc case v @ (_, acc) => if (!_acc.compareAndSet(v, v.copy(_1 = true))) observe() else acc } } } I vaguely remember Daniel mentioning this technique some time ago?
  • 80.
    Lesson learned: Different waysto construct concurrent data structures. Pick the right impl for the right job.
  • 81.
  • 82.
  • 83.
    IO / AIO/ NIO
  • 84.
    IO / AIO/ NIO / Zero
  • 85.
    Synchronous I /O — Havoc Pennington (HAL, GNOME, GConf, D-BUS, now Typesafe) When I learned J2EE about 2008 with some of my desktop colleagues our reactions included something like:
 
 ”wtf is this sync IO crap, 
 where is the main loop?!” :-)
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
    Kernels and CPUs […]switching from user-level to kernel-level 
 on a (2.8 GHz) P4 is 1348 cycles. 
 
 […] Counting actual time, the P4 takes 481ns […] http://wiki.osdev.org/Context_Switching
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
    I / O “Don’tworry. It only gets worse!”
  • 99.
    I / O “Don’tworry. It only gets worse!” Same data in 3 buffers!4 mode switches!
  • 100.
    Asynchronous I /O [Linux] Linux AIO = JVM NIO
  • 101.
    Asynchronous I /O [Linux] NewIO… since 2004! (No-one calls it “new” any more) Linux AIO = JVM NIO
  • 102.
    Asynchronous I /O [Linux] Less time wasted waiting. Same amount of buffer copies.
  • 103.
    ZeroCopy = sendfile[Linux] “Work smarter. Not harder.” http://fourhourworkweek.com/
  • 104.
  • 105.
    ZeroCopy = sendfile[Linux] Data never leaves kernel mode!
  • 106.
    Lesson learned: By pickingthe right tool, you can avoid wasting CPU time. Asynchronous IO will not beat a synchronous seek throughput-wise but it can allow higher scalability.
  • 107.
  • 108.
  • 109.
    Functional outside, Intelinside: Sorting Jan Pustelnik’s Benchmarks: https://github.com/gosubpl • Haskell Functional – http://en.literateprograms.org • Haskell Imperative – http://stackoverflow.com/questions/11481675/using- vectors-for-performance-improvement-in-haskell • C++ – by-the-book impl, rather simple, plenty examples in books • Scala – Scala by Example; and ST monadic example from https:// github.com/fpinscala/fpinscala/blob/master/answers/src/main/scala/fpinscala/ localeffects/LocalEffects.scala (Solutions to Functional programming in Scala by Paul Chiusano and Runar Bjarnason).
  • 110.
    Jan Pustelnik’s Benchmarks:https://github.com/gosubpl Functional outside, Intel inside: Sorting
  • 111.
    Jan Pustelnik’s Benchmarks:https://github.com/gosubpl Functional outside, Intel inside: Sorting
  • 112.
    Jan Pustelnik’s Benchmarks:https://github.com/gosubpl Functional outside, Intel inside: Sorting
  • 113.
    Jan Pustelnik’s Benchmarks:https://github.com/gosubpl Functional outside, Intel inside: Sorting
  • 114.
    Scala 2.10.5 –Vector#sorted defsorted[B >: A](implicit ord: Ordering[B]): Repr = { val len = this.length val arr = new ArraySeq[A](len) var i = 0 for (x <- this.seq) { arr(i) = x i += 1 } java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]]) val b = newBuilder b.sizeHint(len) for (x <- arr) b += x b.result } Functional outside, Intel inside: Sorting
  • 115.
  • 116.
    MicroBenchmarking DON’T DOTHIS: https://github.com/travisbrown/circe/issues/59
  • 117.
    MicroBenchmarking DON’T DOTHIS: https://github.com/travisbrown/circe/issues/59
  • 118.
  • 119.
    JMH – perf_eventsintegration http://openjdk.java.net/projects/code-tools/jmh/github.com/ktoso/sbt-jmh Perf stats: -------------------------------------------------- 4172.776137 task-clock (msec) # 0.411 CPUs utilized 612 context-switches # 0.147 K/sec 31 cpu-migrations # 0.007 K/sec 195 page-faults # 0.047 K/sec 16,599,643,026 cycles # 3.978 GHz [30.80%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 17,815,084,879 instructions # 1.07 insns per cycle [38.49%] 3,813,373,583 branches # 913.870 M/sec [38.56%] 1,212,788 branch-misses # 0.03% of all branches [38.91%] 7,582,256,427 L1-dcache-loads # 1817.077 M/sec [39.07%] 312,913 L1-dcache-load-misses # 0.00% of all L1-dcache hits [38.66%] 35,688 LLC-loads # 0.009 M/sec [32.58%] <not supported> LLC-load-misses:HG <not supported> L1-icache-loads:HG 161,436 L1-icache-load-misses:HG # 0.00% of all L1-icache hits [32.81%] 7,200,981,198 dTLB-loads:HG # 1725.705 M/sec [32.68%] 3,360 dTLB-load-misses:HG # 0.00% of all dTLB cache hits [32.65%] 193,874 iTLB-loads:HG # 0.046 M/sec [32.56%] 4,193 iTLB-load-misses:HG # 2.16% of all iTLB cache hits [32.44%] <not supported> L1-dcache-prefetches:HG 0 L1-dcache-prefetch-misses:HG # 0.000 K/sec [32.33%] 10.159432892 seconds time elapsed
  • 120.
    JMH – perfasmintegration Nitsan Wakart’s blog:
 http://psy-lob-saw.blogspot.co.uk/2015/07/jmh-perfasm.html
  • 121.
    Other tools: FlameGraphs https://github.com/brendangregg/FlameGraph https://github.com/jrudolph/perf-map-agent
  • 122.
    Immutable DSL /AST outside
  • 123.
    Immutable DSL /AST outside streams engine inside
  • 124.
    Immutable DSL +Akka engine We happen to have: a pretty performant engine… + constrained domain =
  • 125.
    Immutable DSL +Akka engine We happen to have: a pretty performant engine… + constrained domain =
  • 126.
    Immutable DSL +Akka engine implicit val mat = ActorMaterializer() val fives = Source.repeat(5) val timesTwo = Flow[Int].map(_ * 2) val intToString = Flow[Int].map(_.toString) val transform = timesTwo via intToString val sysOut = Sink.foreach(println) val r = fives via transform.take(10) to sysOut 
 r.run() // uses implicit Materializer
  • 127.
    implicit val mat= ActorMaterializer() val fives = Source.repeat(5) val timesTwo = Flow[Int].map(_ * 2) val intToString = Flow[Int].map(_.toString) val transform = timesTwo via intToString val sysOut = Sink.foreach(println) val r = fives via transform.take(10) to sysOut 
 r.run() // uses implicit Materializer Immutable DSL + Akka engine
  • 128.
    implicit val mat= ActorMaterializer() val fives = Source.repeat(5) val timesTwo = Flow[Int].map(_ * 2) val intToString = Flow[Int].map(_.toString) val transform = timesTwo via intToString val sysOut = Sink.foreach(println) val r = fives via transform.take(10) to sysOut 
 r.run() // uses implicit Materializer Immutable DSL + Akka engine
  • 129.
    Decoupled “what?” from“how?” What? val g = val one: Source[Int, Unit] = Source.single(1) val two: Source[Int, Promise[Unit]] = Source.lazyEmpty val three: Source[Int, Unit] = Source.single(3)
  • 130.
    Decoupled “what?” from“how?” What? val g = val one: Source[Int, Unit] = Source.single(1) val two: Source[Int, Promise[Unit]] = Source.lazyEmpty val three: Source[Int, Unit] = Source.single(3) val step1: Source[Int, Promise[Unit]] = Source(one, two, three)(combineMat = (m1, m2, m3) => m2) { implicit b => (on, tw, th) => val m = b.add(Merge[Int](3)) on ~> m.in(0) tw ~> m.in(1) th ~> m.in(2) m.out }
  • 131.
    Decoupled “what?” from“how?” What? val g = val one: Source[Int, Unit] = Source.single(1) val two: Source[Int, Promise[Unit]] = Source.lazyEmpty val three: Source[Int, Unit] = Source.single(3) val step1: Source[Int, Promise[Unit]] = Source(one, two, three)(combineMat = (m1, m2, m3) => m2) { implicit b => (on, tw, th) => val m = b.add(Merge[Int](3)) on ~> m.in(0) tw ~> m.in(1) th ~> m.in(2) m.out }
  • 132.
    Decoupled “what?” from“how?” What? val g = val step2 = Flow() { implicit b => val broadcast = b.add(Broadcast[Int](2, eagerCancel = true)) val merge = b.add(Merge[Int](2)) broadcast.out(0) ~> Flow[Int].map(_ + 2) ~> merge.in(0) broadcast.out(1) ~> Flow[Int].map(_ - 2) ~> merge.in(1) (broadcast.in, merge.out) }
  • 133.
    Decoupled “what?” from“how?” What? val g = How? g.run()(materializer)
  • 134.
    Decoupled “what?” from“how?” What? val g = How? g.run()(materializer) Lifted representation => inspection + optimisation
  • 135.
    Lesson learned: Being ableto drop down to the “mutable-stuff “ is a huge benefit of Scala, and is very useful in very specific places! APIs can stay pure and immutable!
  • 136.
  • 137.
    Distributed Systems “… inwhich the failure of a computer you didn't even know existed can render your own computer unusable.” — Leslie Lamport http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt
  • 138.
    Distributed Systems The biggerthe system, the more “random” latency / failure noise. Embrace instead of hiding it.
  • 139.
  • 140.
    Backup requests A techniquefor fighting “long tail latencies”. By issuing duplicated work, when SLA seems in danger.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
    Backup requests Avg Stddev 95%ile 99%ile 99.9%ile No backups 33 ms 1524 ms 24 ms 52 ms 994 ms After 10ms 14 ms 4 ms 20 ms 23 ms 50 ms After 50ms 16 ms 12 ms 57 ms 63 ms 68 ms Jeff Dean - Achieving Rapid Response Times in Large Online Services Peter Bailis - Doing Redundant Work to Speed Up Distributed Queries Akka - Krzysztof Janosz @ Akkathon, Kraków - TailChoppingRouter (docs, pr)
  • 147.
    Lesson learned: Backup requestsallow trade-off increased load for decreased latency.
  • 148.
  • 149.
    Combined requests &back-pressure
  • 150.
    Combined requests &back-pressure No no no…! Not THAT Back-pressure!
  • 151.
    Combined requests &back-pressure THAT kind of back-pressure:
  • 152.
    Combined requests &back-pressure THAT kind of back-pressure: www.reactive-streams.org
  • 153.
    Combined requests &back-pressure THAT kind of back-pressure: www.reactive-streams.org
  • 154.
    Combined requests &back-pressure THAT kind of back-pressure: www.reactive-streams.org
  • 155.
    Combined requests A techniquefor avoiding duplicated work. By aggregating requests, possibly increasing latency. “Wat?Why would I increase latency!?”
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
    Lesson learned: Back-pressure savessystems from overload. Combined requests trade higher latency, for less work for the downstream.
  • 161.
  • 162.
    Wrapping up You don’tneed to be a great mechanic to be a great racing driver, but you must understand how your bolid works! ~ Martin Thompson (in context of Mechanical Sympathy)
  • 163.
    Wrapping up • Someonehas to bite the bullet though! • We’re all running on real hardware. • Libraries do it so you don’t have to - pick the right one! Be aware that:
  • 164.
    Wrapping up • Keepyour apps pure • Be aware of internals • Async all the things! • Messaging all the way! • Someone has to bite the bullet though! • We’re all running on real hardware. • Libraries do it so you don’t have to - pick the right one! Be aware that:
  • 165.
    Links • akka.io • reactive-streams.org •akka-user
 • Gil Tene - How NOT to measure latency, 2013 • Jeff Dean @Velocity 2014 • Alan Bateman, Jeanfrancois Arcand (Sun) Async IO Tips @ JavaOne • http://linux.die.net/man/2/select • http://linux.die.net/man/2/poll • http://linux.die.net/man/4/epoll • giltene/jHiccup • Linux Journal: ZeroCopy I, Dragan Stancevis 2013 • Last slide car picture: http://actu-moteurs.com/sprint/gt-tour/jean- philippe-belloc-un-beau-challenge-avec-le-akka-asp-team/2000
  • 166.
    Links • http://wiki.osdev.org/Context_Switching • CppCon:Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades)" • http://www.infoq.com/presentations/reactive-services-scale • Gil Tene’s HdrHistogram.org • http://hdrhistogram.github.io/HdrHistogram/plotFiles.html • Rob Pike - Concurrency is NOT Parallelism (video) • Brendan Gregg - Systems Performance: Enterprise and the Cloud (book) • http://psy-lob-saw.blogspot.com/2015/02/hdrhistogram-better-latency-capture.html • Jeff Dean, Luiz Andre Barroso - The Tail at Scale (whitepaper,ACM) • http://highscalability.com/blog/2012/3/12/google-taming-the-long-latency-tail-when- more-machines-equal.html • http://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system- architects/ • Marcus Lagergren - Oracle JRockit:The Definitive Guide (book) • http://mechanical-sympathy.blogspot.com/2013/08/lock-based-vs-lock-free- concurrent.html • Handling of Asynchronous Events - http://www.win.tue.nl/~aeb/linux/lk/lk-12.html • http://www.kegel.com/c10k.html
  • 167.
    Links • www.reactivemanifesto.org/ • Seriouslythe only right way to micro benchmark on the JVM: • JMH openjdk.java.net/projects/code-tools/jmh/ • JMH for Scala: https://github.com/ktoso/sbt-jmh • http://www.ibm.com/developerworks/library/l-async/ • http://lse.sourceforge.net/io/aio.html • https://code.google.com/p/kernel/wiki/AIOUserGuide • ShmooCon: C10M - Defending the Internet At Scale (Robert Graham) • http://blog.erratasec.com/2013/02/scalability-its-question-that-drives-us.html#.VO6E11PF8SM •User-level threads....... with threads. - Paul Turner @ Linux Plumbers Conf 2013 •https://www.facebook.com/themainstreetpiggyboys/photos/a. 1390784047896753.1073741829.1390594984582326/1423592294615928/? type=1&theater for the Rollin’ Cuy on last slide
  • 168.
    Special thanks to: •Aleksey Shipilëv • Andrzej Grzesik • Gil Tene • Kirk Pepperdine • Łukasz Dubiel • Marcus Lagergren • Martin Thompson • Mateusz Dymczyk • Nitsan Wakart Thanks guys, you’re awesome. alphabetically, mostly • Peter Lawrey • Richard Warburton • Roland Kuhn • Runar Bjarnason • Sergey Kuksenko • Steve Poole • Viktor Klang a.k.a. √ • Antoine de Saint Exupéry;-) • and the entire AkkaTeam • the Mechanical Sympathy Mailing List
  • 169.
    Thanks & Q/A! ktoso@ typesafe.com twitter: ktosopl github: ktoso team blog: letitcrash.com home: akka.io
  • 170.
    ©Typesafe 2015 –All Rights Reserved