The functional programming paradigm is not often synonymous with performant software. I will try to convince you otherwise. We construct a real-world example revolving around implementing an order book, which is how financial exchanges match buyers and sellers. Our exploration of the order book involves a dive into how scala.collection.immtuable.Queue benefits from deferred evaluation to improve performance. Through a series of motivating design questions, we discover how the order book implementation may also benefit from deferred evaluation. Applying our newfound knowledge leads us to developing a jmh microbenchmark, in order to evaluate performance.
By the end of our journey, you will walk away with excellent additions to your developer toolkit. Stocked with an appreciation for the benefits of deferred evaluation, motivating design questions to challenge assumptions, and a working knowledge of jmh to benchmark performance, you will be better prepared to take on challenges in your professional life. Caveat emptor - no such guarantees are made for your personal life!
This presentation is based off material I wrote about in a book I co-authored in 2016 entitled, Scala High Performance Programming. During the presentation, I will provide a book discount code for those interested in learning more.
6. Getting to know the order book
Bid Offer
27.01 x 427.03 x 1
27.00 x 127.05 x 3
26.97 x 227.10 x 2
Bid Offer
27.01 x 427.03 x 1
27.00 x 127.05 x 3
26.97 x 227.10 x 2
Bid Offer
27.01 x 527.03 x 1
27.00 x 127.05 x 3
26.97 x 227.10 x 2
Buy @ 27.05
ID = 7389
Bid Offer
27.01 x 427.05 x 3
27.00 x 127.10 x 2
26.97 x 2
Crossing the book
Resting on the book
Bid Offer
27.01 x 427.03 x 1
27.00 x 127.05 x 3
26.97 x 227.10 x 2
Bid Offer
27.01 x 327.03 x 1
27.00 x 127.05 x 3
26.97 x 227.10 x 1
Canceling an order request
Buy @ 27.01
ID = 1932
Cancel
ID = 5502
7. Let's model the order book
class TreeMap[A, +B] private (tree: RB.Tree[A, B])
(implicit val ordering: Ordering[A])
object Price {
implicit val ordering: Ordering[Price] =
new Ordering[Price] {
def compare(x: Price, y: Price): Int =
Ordering.BigDecimal.compare(x.value, y.value)
}
}
case class QueueOrderBook(
bids: TreeMap[Price, Queue[BuyLimitOrder]],
offers: TreeMap[Price, Queue[SellLimitOrder]]) {
def bestBid: Option[BuyLimitOrder] = // highest price
bids.lastOption.flatMap(_._2.headOption)
def bestOffer: Option[SellLimitOrder] = // lowest price
offers.headOption.flatMap(_._2.headOption)
}
9. Better know a queue (2/3)
package scala.collection
package immutable
class Queue[+A] protected(
protected val in: List[A],
protected val out: List[A]) {
def enqueue[B >: A](elem: B) = new Queue(elem :: in, out)
def dequeue: (A, Queue[A]) = out match {
case Nil if !in.isEmpty => val rev = in.reverse ;
(rev.head, new Queue(Nil, rev.tail))
case x :: xs => (x, new Queue(in, xs))
case _ => throw new
NoSuchElementException("dequeue on empty queue")
}
}
Deferred evaluation
List.reverse: O(N)
11. Understanding performance:
buy limit order arrives
Is there a resting sell order
priced <= the buy order?
TreeMap.headOption: O(Log)
Add resting buy order
TreeMap.get: O(Log)
Queue.enqueue: O(1)
TreeMap.+: O(Log)
Cross to execute offer
TreeMap.headOption: O(Log)
Queue.dequeue: amortized O(1)
TreeMap.+: O(Log)
LimitOrderAdded OrderExecuted
No Yes
Yields Yields
12. Understanding performance:
order cancel request arrives
Is there a bid
with matching ID?
TreeMap.find: O(N)
Queue.exists: O(N)
Is there an offer
with matching ID?
TreeMap.find: O(N)
Queue.exists: O(N)
OrderCancelRejected OrderCanceled
Remove order within
price level queue
Queue.filter: O(N)
YesNo
Yes
YieldsNo - Yields
13. Quiz!
Which operation is QueueOrderBook most
optimized for?
A. Adding a resting order
B. Crossing the book
C. Canceling a resting order
D. Rejecting an order cancel request
14. Answer!
Which operation is QueueOrderBook most
optimized for?
A. Adding a resting order
B. Crossing the book
C. Canceling a resting order
D. Rejecting an order cancel request
15. Quiz!
What is the distribution of operation
frequency seen in production?
Resting
Crossing
Canceling
Rejecting
Resting
Crossing
Canceling
Rejecting
Resting
Crossing
Canceling
Rejecting
Resting
Crossing
Canceling
Rejecting
A.
B.
C.
D.
16. Answer!
E. None of the above - I haven't given you
enough information
(Forgive me for providing a trick question)
17. Understanding our operating
environment
6 months of historical data show the
distribution below:
Resting
Crossing
Canceling
Rejecting
Does the QueueOrderBook implementation
strike an optimum performance balance?
19. Isolating the problem
When canceling an order, there are two
expensive operations:
1. Identifying the price level containing the
order-to-be-canceled
2. Traversing a Queue to remove the canceled
order
case class QueueOrderBook(
bids: TreeMap[Price, Queue[BuyLimitOrder]],
offers: TreeMap[Price, Queue[SellLimitOrder]])
#1 #2
20. Applying deferred evaluation
How can the order book defer the cost of
linear traversal to modify internal state?
Queue up your ideas!
23. New idea
case class LazyCancelOrderBook(
pendingCancelIds: Set[OrderId],
bids: TreeMap[Price,
Queue[BuyLimitOrder]],
offers: TreeMap[Price,
Queue[SellLimitOrder]])
Like Queue, let's add state to optimize
the slowest and most frequently seen
operation: canceling
24. A tale of two order cancel requests
Is there a bid
with matching ID?
TreeMap.find: O(N)
Queue.exists: O(N)
OrderCanceled
Remove order within
price level queue
Queue.filter: O(N)
Yes
Yields
Add order ID to
pending cancels
Set.+: effectively O(1)
OrderCanceled
Yields
def handleCancelOrder(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook,
id: OrderId):
(LazyCancelOrderBook, Event) =
ob.copy(pendingCancelIds =
ob.pendingCancelIds + id) ->
OrderCanceled(currentTime(), id)
Similar to Queue.enqueue
QueueOrderBook LazyCancelOrderBook
27. Will this unit test pass?
"""Given empty book
|When cancel order arrives
|Then OrderCancelRejected
""".stripMargin ! Prop.forAll(
OrderId.genOrderId,
CommandInstant.genCommandInstant,
EventInstant.genEventInstant) { (id, ci, ei) =>
LazyCancelOrderBook.handle(
() => ei, LazyCancelOrderBook.empty,
CancelOrder(ci, id))._2 ====
OrderCancelRejected(ei, id)
}
Public API that supports all order book operations:
def handle(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook,
c: Command): (LazyCancelOrderBook, Event)
28. Motivating Design Question #4
Can I change any constraints to allow me to
model the problem differently?
29. Let's try again
case class LazyCancelOrderBook(
activeIds: Set[OrderId],
pendingCancelIds: Set[OrderId],
bids: TreeMap[Price, Queue[BuyLimitOrder]],
offers: TreeMap[Price, Queue[SellLimitOrder]])
Since order cancel reject support is a hard
requirement, the new implementation needs
additional state
30. Rejecting invalid cancel requests
Is there a bid
with matching ID?
TreeMap.find: O(N)
Queue.exists: O(N)
Is there an offer
with matching ID?
TreeMap.find: O(N)
Queue.exists: O(N)
OrderCancelRejected
No
No - Yields
QueueOrderBook LazyCancelOrderBook
Is there an active order
with matching ID?
Set.contains: effectively O(1)
No
OrderCancelRejected
def handleCancelOrder(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook,
id: OrderId): (LazyCancelOrderBook, Event)
= ob.activeIds.contains(id) match {
case true => ob.copy(
activeIds = ob.activeIds – id,
pendingCancelIds = ob.pendingCancelIds
+ id) -> OrderCanceled(currentTime(), id)
case false => ob ->
OrderCancelRejected(currentTime(), id)
}
31. Resting buy order requests
Is there a resting sell order
priced <= the buy order?
TreeMap.headOption: O(Log)
Add resting buy order
TreeMap.get: O(Log)
Queue.enqueue: O(1)
Set.+: effectively O(1)
TreeMap.+: O(Log)
LimitOrderAdded
No
Yields
def handleAddLimitOrder(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook,
lo: LimitOrder):
(LazyCancelOrderBook, Event) = lo match {
case b: BuyLimitOrder =>
ob.bestOffer.exists(_.price.value <= b.price.value) match {
case true => ??? // Omitted
case false =>
val orders = ob.bids.getOrElse(b.price, Queue.empty)
ob.copy(
bids = ob.bids + (b.price -> orders.enqueue(b)),
activeIds = ob.activeIds + b.id) ->
LimitOrderAdded(currentTime())
}
case s: SellLimitOrder =>
??? // Omitted
}
Continuing to defer
evaluation of canceled
order requests
33. def handleAddLimitOrder(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook,
lo: LimitOrder): (LazyCancelOrderBook, Event) = lo match {
case b: BuyLimitOrder =>
ob.bestOffer.exists(_.price.value <= b.price.value) match {
case true => ob.offers.headOption.fold(restLimitOrder) {
case (p, q) => ??? // We need to fill this in
}
case false => restLimitOrder
Goals:
● Find active resting sell order to generate
OrderExecuted event
● Remove canceled resting orders found in
front of active resting sell
Canceled
ID = 1
Canceled
ID = 2
Canceled
ID = 3
Active
ID = 4
Canceled
ID = 5
Active
ID = 621.07 ->
Given:
We want the following final state:
Canceled
ID = 5
Active
ID = 621.07 ->
What are our goals?
34. Translating our goals to code (1/3)
@tailrec
def findActiveOrder(
q: Queue[SellLimitOrder],
idsToRemove: Set[OrderId]):
(Option[SellLimitOrder],
Option[Queue[SellLimitOrder]],
Set[OrderId]) = ???
Optionally, find an active order to
generate execution
Set of canceled order IDs to discard
Optionally, have a non-empty queue remaining after
removing matching active order and canceled orders
35. Translating our goals to code (2/3)
@tailrec
def findActiveOrder(
q: Queue[SellLimitOrder],
idsToRemove: Set[OrderId]):
(Option[SellLimitOrder], Option[Queue[SellLimitOrder]],
Set[OrderId]) =
q.dequeueOption match {
case Some((o, qq)) =>
ob.pendingCancelIds.contains(o.id) match {
case true => findActiveOrder(qq, idsToRemove + o.id)
case false => (
Some(o),
if (qq.nonEmpty) Some(qq) else None,
idsToRemove + o.id)
}
case None => (None, None, idsToRemove)
}
Found active order;
stop recursing
Queue emptied without finding active order;
stop recursing
Found canceled order;
Keep recursing
36. Translating our goals to code (3/3)
// Earlier segments omitted
findActiveOrder(q, Set.empty) match {
case (Some(o), Some(qq), rms) => (ob.copy(
offers = ob.offers + (o.price -> qq),
pendingCancelIds = ob.pendingCancelIds -- rms,
activeIds = ob.activeIds -- rms),
OrderExecuted(currentTime(),
Execution(b.id, o.price), Execution(o.id, o.price)))
case (Some(o), None, rms) => (ob.copy(
offers = ob.offers - o.price,
pendingCancelIds = ob.pendingCancelIds -- rms,
activeIds = ob.activeIds -- rms),
OrderExecuted(currentTime(),
Execution(b.id, o.price), Execution(o.id, o.price)))
case (None, _, rms) =>
val bs = ob.bids.getOrElse(b.price, Queue.empty).enqueue(b)
(ob.copy(bids = ob.bids + (b.price -> bs),
offers = ob.offers - p,
pendingCancelIds = ob.pendingCancelIds -- rms,
activeIds = ob.activeIds -- rms + b.id),
LimitOrderAdded(currentTime()))
}
Found an active order
and queue is non-empty
Since no active order
was found, the price
level must be empty
Found an active
order and queue is
empty
38. How to measure?
The 3 most important rules about
microbenchmarking:
1. Use JMH
2. ?
3. ?
39. How to measure?
The 3 most important rules about
microbenchmarking:
1. Use JMH
2. Use JMH
3. ?
40. How to measure?
The 3 most important rules about
microbenchmarking:
1. Use JMH
2. Use JMH
3. Use JMH
41. The shape of a JMH test
Test state Test configuration
Benchmarks
42. JMH: Test state (1 of 2)
@State(Scope.Benchmark)
class BookWithLargeQueue {
@Param(Array("1", "10"))
var enqueuedOrderCount: Int = 0
var eagerBook: QueueOrderBook = QueueOrderBook.empty
var lazyBook: LazyCancelOrderBook =
LazyCancelOrderBook.empty
var cancelLast: CancelOrder =
CancelOrder(CommandInstant.now(), OrderId(-1))
// More state to come
}
Which groups of threads share
the state defined below?
What test state do we want to
control when running the test?
Note var usage to
manage state
43. JMH: Test state (2 of 2)
class BookWithLargeQueue {
// Defined vars above
@Setup(Level.Trial)
def setup(): Unit = {
cancelLast = CancelOrder(
CommandInstant.now(), OrderId(enqueuedOrderCount))
eagerBook = {
(1 to enqueuedOrderCount).foldLeft(
QueueOrderBook.empty) { case (ob, i) =>
QueueOrderBook.handle(
() => EventInstant.now(),
ob,
AddLimitOrder(
CommandInstant.now(),
BuyLimitOrder(OrderId(i),
Price(BigDecimal(1.00)))))._1
}
}
lazyBook = ??? // Same as eagerBook
}
}
How often will the state
be re-initialized?
Mutable state
is initialized
here
50. Motivating design questions
Question Application to the order book example
What operations in my system are
most performant?
Executing an order and resting an order on the book
are the most performant operations. We leveraged
fast execution time to perform removals of canceled
orders from the book.
Why am I performing all of these
steps now?
Originally, order removal happened eagerly because
it was the most logical way to model the process.
How can I decompose the problem
into smaller discrete chunks?
The act of canceling was decomposed into
identifying the event sent to the requester and
removing the cancelled order from the book state.
Can I change any constraints to
allow me to model the problem
differently?
Ideally, we would have liked to remove the constraint
requiring rejection of non-existent orders.
Unfortunately, this was out of our control.
Let’s figure out where we want to be by the end of this talk.
We want to be like this guy!
Does anyone know who this is?
Wall Street. I can’t guarantee you’ll be making Michael Douglas bucks by tend the of this talk. But, you will know more about financial trading.
Let’s start at the bottom and see what we’ll cover. We will focus on fp, performance and apply it to the financial trading domain.
In doing so, we’ll dig into deferred evaluation (the name of this talk!) and the design of Scala’s Queue. Our exploration will cover design tradeoffs and jmh, and we will learn about a central concept in trading: the order book.
I’m excited to share my learnings with you. When we finish, you will have new strategies / tools in your toolbox for designing robust/performant software.
Let’s get started!
Order book is central to trading. The order book is how traders indicate buy and sell interest. Imagine a stock like Google changing in price. Here’s how the price changes.
Resting: Adds to top of book bid
Crossing: Hits the single offer at 27.03
Canceling: Remove the 27.10 offer
How can we model this concept?
TreeMap: Modeling the importance of prices by providing fast access to the lowest/highest prices.
The definition shows the key must define an Ordering. In our case, the price defines ordering based on its underlying data type – BigDecimal
Each column of the order book is represented with its own TreeMap. Looking at a particular book side, the rows are represented with Scala’s Queue.
Let’s get to know the Queue data structure better
Here’s the Scala Queue with implementation omitted and underlying data structure hidden.
The question here is: What data structure or structures does Queue use?
1st thing to notice is usage of two Lists. Why bother doing that?
A linked list supports fast prepend operations. Accessing the end of the list is slow. When we dequeue, we will want to access the end of the list.
OK, great. I may have convinced you a single linked list is a suboptimal way to represent a FIFO queue. But, how does it help to have another List?
Notice when we enqueue, out is not used. The sausage is made in dequeue.
When out is empty, a single O(N) reverse operation now gives us FIFO ordering at the head of out. This is deferred evaluation!
Let’s walk through an example to better understand Queue’s behavior.
Now that we understand how Queue works, let’s get a feeling for the runtime performance of QueueOrderBook.
Reach under your seats, grab your electronic voting device, and answer this question.
OK, OK, so there’s no voting device. Any takers on this question? Call it out!
(C) and (D) involve multiple linear time operations to perform cancel work. Clearly, they are out of contention.
Both (A) and (B) have faster runtime performance. The difference we saw was that crossing involves the amortized constant time dequeue operation.
One more question before we move on.
Which of these distributions matches what happens in the real world when Gordon Gekko is making trades?
We haven’t seen any production data yet. Let’s look at some.
Looking at this distribution, what do you think about QueueOrderBook’s performance?
Has it been optimized for the operations that happen most frequently?
This brings us our first of four design questions. These questions will motivate our thinking in this talk.
If we considered this question before designing QueueOrderBook, we may have made different choices to optimize for cancels.
Let’s figure out what parts are particularly expensive.
The time to defer is now!
This brings us to our next motivating question. Why is all this work happening now?
For example, why do cancels immediately change the state of the book?
Probably because it is how we first considered the problem when we looked a few real-life examples. At the time, we didn’t consider the performance trade offs we were making.
It’s worth reflecting on the systems you have designed. Have you fallen prey to a similar dilemma?
As a segue from our last question, it’s worth considering: &lt;the question&gt;
By breaking a larger logical process into smaller discrete chunk, we create opportunities to introduce deferred evaluation or otherwise optimize specific parts of the challenges facing us.
Reflecting on these two questions, we can refresh our approach to designing a performant order book.
Thinking with our deferred evaluation hats on, let’s put a mark on the wall when a cancel happens, but avoid the expensive order book processing.
We can add the orders to be canceled to a Set and then evaluate the set when crossing the book.
Here, we increase memory requirements in return for what we hope to be faster runtime performance.
Let’s look at the runtime performance of both approaches when an order is canceled. We’ll call our new approach, LazyCancelOrderBook.
Three linear time operations become a single, effectively constant time operation.
We also get a taste of what the code backing LazyCancelOrderBook looks like. To cancel an order, the state is copied with the to-be-cancelled ID added to the Set and an event is returned to signify that the order was canceled.
This operation is analogous to Queue.enqueue.
Job done! Presentation over, let’s go home.
Not so fast!
Here is a unit test that exercises an empty order book. This unit test makes use of property-based testing to setup its inputs. If this Is unfamiliar to you, don’t worry.
Let’s instead focus on what we conceptually expect. If the order book is empty, there is nothing to be canceled.
This is a canonical case of rejecting order cancels. Is that what we will see? Nope!
This brings us to our final motivating design question: &lt;question&gt;
One constraint that would be great to remove is supporting rejects for cancels that correspond to a non-existent or already canceled order ID.
Unfortunately for us, rejecting a cancel requests is table stakes for building an order book.
But, perhaps in your domain, there are assumptions you can challenge. It’s worth considering and questioning because you might greatly simplify your problem space.
Fine! Let’s add another bit of state to help us handle the cancel reject requirement.
Similar to our treatment of to-be-canceled orders, we can use a Set to capture all active orders. This helps us answer the question: Does a cancel request refer to an order that’s in the order book?
In the bottom right, our implementation is updated to reflect the introduction of the Set of active order IDs.
This means canceling an order involves three effectively constant time operations. This is still faster than the original implementation.
After dealing with cancels and resting orders, we’ve finally hit that point: We can’t defer anymore.
Time to roll up our sleeves and figure out how to implement crossing the book.
What are our goals when an order crosses?
Let’s use a partial implementation of handling a limit order. In this method, we focus specifically on the case where a buy order arrived and we determined that there is a matching best offer. What do we want to happen here?
Here’s a simple example to illustrate what we want. After evaluating the incoming order, we want to remove all canceled orders prior to the matched active order.
To start, let’s write a method that returns a tuple of optionally the active, crossed order, optionally the remaining orders in the queue, and the set of canceled order IDs
You’ll see that this method is marked with the tailrec annotation. This is a strong hint that we have a recursive solution to our problem.
The method is driven by recursively evaluating the state of the provided queue, q. Evaluation ends once either an active order is found or once the queue is empty.
The operations happening here are effectively constant time. These operations will happen N times depending on the queue size.
Let’s return to the initial method definition we need to fill in. We can now invoke findActiveOrder and pattern match.
1. We found a sell order and there are orders remaining in the price level denoted as qq.
2. We found a sell order and there are no remaining orders in the price level.
3. We did not find a sell order. By definition, this means that we looked at all the orders in the price level.
In each case we see similar bookkeeping.
Here we pay for the deferred evaluation. While more work is being done, bear in mind that according to the historical data we reviewed, crossing the book is only the 3rd most frequent operation.
We’ve rolled up our sleeves and re-implemented the order book. The code is complete, the unit tests pass. But, how much faster, if at all, is LazyCanceOrderBook than QueueOrderBook?
How can we measure the difference?
Segue into performance and out of functional programming.
One way to measure performance is to write a small app that measures throughput.
How will we know the JVM is warmed up? And how will we ensure the JVM does not optimize away method calls when the return values are not used? How will we instrument state for each test?
JMH!
Will want to review the samples provided in the jmh samples to be prepared with examples of why this is a good idea
Knowing what tool to use is half the battle. How will we use it?
Writing a JMH test involves three parts:
1. Defining the test state
2. Defining test configuration (e.g. warm-up count, JVM settings)
3. Benchmarks – the actual code under test
Let’s build a microbenchmark for the two order book implementations. We’ll start by defining the test state.
In the next few slides, our goal is to get a sense of the landscape rather than exhaustively exploring each option.
The state we define is encapsulated in a class. JMH controls configuration via annotations.
We’re trying to define state that will allow us to queue up varying sizes of orders in a price level within the order book.
As an example of one annotation, the @param allows us to sweep values when we execute the test.
So far all we have done is to the define the test state. No initialization yet.
Later in the same class we can now initialize the mutable state. There is a lifecycle hook of sorts via @Setup. Within this method we can initialize the mutable state.
Here we are adding state to the order book based on the number of orders we wish to queue up.
We also configure the ID of the final order added to the book to allow us to cancel the last order. This will allow us to benchmark canceling the first and last orders.
At this point we’re done with our first JMH building block: test state. Let’s now configure our tests.
In another class, CancelBenchmarks we will soon be defining our benchmarks. First, we apply several annotations to define how we want the benchmarks run.
Can point to a few examples shown in the slide.
It’s also worthwhile to note that these values can be provided as cli args. I like defining this configuration via annotations because it ensures a consistent testing profile.
Much like a junit unit test, each benchmark is annotated with a @Benchmark.
Our benchmarks focus on three cancel scenarios we’ve been considering so far:
1. Cancel 1st, 2. Cancel last, 3. Non-existent cancel
Worth noting that each of these tests return a non-Unit value. JMH takes care of ensuring the JVM does not remove the method invocation.
Also note that our usage of immutability ensures steady state.
Let’s push the red button and kick off the tests!
The raw JMH output looks similar. One notable difference is that it does note express the trial error difference as percentage of throughput. I do this because I find it convenient to review. I scrutinize this value to ensure there is limited variability in test results.
What takeaways do we have here?
A couple of highlights here is that we see a clear win in throughput (higher is better) for canceling the first order independent of order size.
Why does the magnitude of the ops decrease as enqueued order count increases? These are the kinds of questions worth reflecting on to understand the results.
What else would you need to do in order to be comfortable?
How about designing a test that matches the frequency of operations seen in production?
Load testing in a staging environment?
Analyzing memory usage? We potentially dramatically increased memory usage and changed GC patterns due to long-lived order IDs.
The goal is to make you think and consider the tradeoffs. Often pausing to consider is enough to uncover serious flaws.
And if that’s not enough, then you can exercise your well-practiced rollback plan.
Before we conclude I want to rehash the motivating questions we asked ourselves while working on the order book.
Draw parallels to your own work.