ScalaBlitz

ScalaBlitz
Efficient Collections Framework

Blitz-chess is a style of
rapid chess play.

def mean(xs: Array[Float]): Float =
xs.par.reduce(_ + _) / xs.length

With Lists, operations
can only be executed
from left to right

1 2 4 8
Not your typical list.

No amount of
documentation is
apparently enough

The reduceLeft
guarantees operations are
executed from left to right

Parallel and sequential
collections sharing operations

There are several
problems here

And sometimes we
were just slow

So, we have a new API now
def findDoe(names: Array[String]): Option[String] =
{
names.toPar.find(_.endsWith(“Doe”))
}

Wait, you renamed a
method?
{
}

Yeah, par already exists.
But, toPar is different.
{
}

def findDoe(names: Array[String]): Option[String] = {
}
implicit class ParOps[Repr](val r: Repr) extends AnyVal {
def toPar = new Par(r)
}

ParOps(names).toPar.find(_.endsWith(“Doe”))
}
}

ParOps(names).toPar.find(_.endsWith(“Doe”))
}
}
class Par[Repr](r: Repr)

(new Par(names)).find(_.endsWith(“Doe”))
}
}

}
But, Par[Repr] does not
have the find method!

}
True, but Par[Array[String]]
does have a find method.

}
implicit class ParArrayOps[T](pa: Par[Array[T]]) {
...
def find(p: T => Boolean): Option[T]
...
}

More flexible!
● does not have to implement methods that
make no sense in parallel

More flexible!
● slow conversions explicit

No standard library collections were
hurt doing this.

More flexible!
● non-intrusive addition to standard library

More flexible!
● easy to add new methods and collections

More flexible!
● easy to add new methods and collections
● import switches between implementations

def findDoe(names: Seq[String]): Option[String] = {
}

def findDoe(names: Seq[String]): Option[String] = {
}
But how do I write generic code?

def findDoe[Repr[_]](names: Par[Repr[String]]) = {
}

def findDoe[Repr[_]](names: Par[Repr[String]]) = {
}
Par[Repr[String]]does not
have a find

def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = {
}

def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = {
}
We don’t do this.

Make everything as simple as
possible, but not simpler.

def findDoe(names: Reducable[String])= {
names.find(_.endsWith(“Doe”))
}

}
findDoe(Array(1, 2, 3).toPar)

}
findDoe(toReducable(Array(1, 2, 3).toPar))

}
findDoe(toReducable(Array(1, 2, 3).toPar))
def arrayIsReducable[T]: IsReducable[T] = { … }

import scala.collection.par._
val pixels = new Array[Int](wdt * hgt)
for (idx <- (0 until (wdt * hgt)).toPar) {
}

val x = idx % wdt
val y = idx / wdt
}

val x = idx % wdt
val y = idx / wdt
pixels(idx) = computeColor(x, y)
}

val x = idx % wdt
val y = idx / wdt
}
Scheduler not found!

import Scheduler.Implicits.global
val x = idx % wdt
val y = idx / wdt
}

New parallel collections
33% faster!
Now
103 ms
Previously
148 ms

Workstealing tree scheduler
rocks!

Workstealing tree scheduler
rocks!
But, are there other interesting
workloads?

Fine-grained uniform
workloads are on the opposite
side of the spectrum.

def mean(xs: Array[Float]): Float = {
val sum = xs.toPar.fold(0)(_ + _)
sum / xs.length
}

sum / xs.length
}
Now
Previously
15 ms
565 ms

def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = {
var it = a.iterator
var acc = z
while (it.hasNext) {
acc = op(acc, it.next)
}
acc
}

var it = a.iterator
var acc = z
acc = box(op(acc, it.next))
}
acc
}

var it = a.iterator
var acc = z
acc = box(op(acc, it.next))
}
acc
}
Generic methods cause boxing of primitives

sum / xs.length
}
Generic methods hurt performance
What can we do instead?

sum / xs.length
}
Generic methods hurt performance
What can we do instead?
Inline method body!

val sum = {
var it = xs.iterator
var acc = 0
acc = acc + it.next
}
acc
}
sum / xs.length
}

val sum = {
var acc = 0
acc = acc + it.next
}
acc
}
sum / xs.length
}
Specific type
No boxing!
No memory allocation!

val sum = {
var acc = 0
acc = acc + it.next
}
acc
}
sum / xs.length
}
Specific type
No boxing!
No memory allocation!
565 ms → 281 ms
2X speedup

val sum = {
var acc = 0
acc = acc + it.next
}
acc
}
sum / xs.length
}
Iterators? For Array?
We don’t need them!

val sum = {
var i = 0
val until = xs.size
var acc = 0
while (i < until) {
acc = acc + a(i)
i = i + 1
}
acc
}
sum / xs.length
}
Use index-based access!

val sum = {
var i = 0
val until = xs.size
var acc = 0
while (i < until) {
acc = acc + a(i)
i = i + 1
}
acc
}
sum / xs.length
}
Use index-based access!
281 ms → 15 ms
19x speedup

Are those optimizations parallel-collections specific?

No

No
You can use them on sequential collections

val sum = xs.fold(0)(_ + _)
sum / xs.length
}

import scala.collections.optimizer._
def mean(xs: Array[Float]): Float = optimize{
sum / xs.length
}

import scala.collections.optimizer._
def mean(xs: Array[Float]): Float = optimize{
sum / xs.length
}
You get 38 times speedup!

@specialized collections
● Maps
● Sets
● Lists
● Vectors
Both faster &
consuming less
memory

@specialized collections
● Maps
● Sets
● Lists
● Vectors
Both faster &
consuming less
memory
Expect to get this for free inside
optimize{} block

jdk8-style streams(parallel views)
● Fast
● Lightweight
● Expressive API
● Optimized
Lazy data-parallel
operations made
easy

Future’s based asynchronous API
val sum = future{ xs.sum }
val normalized = sum.andThen(sum => sum/xs.size)
Boilerplate code, ugly

Future’s based asynchronous API
val sum = xs.toFuture.sum
val scaled = xs.map(_ / sum)
● Simple to use
● Lightweight
● Expressive API
● Optimized
Asynchronous data-parallel
operations
made easy

Current research: operation fusion
val minMaleAge = people.filter(_.isMale)
.map(_.age).min
val minFemaleAge = people.filter(_.isFemale)
.map(_.age).min

.map(_.age).min
.map(_.age).min
● Requires up to 3 times more memory than original collection
● Requires 6 traversals of collections

.map(_.age).min
.map(_.age).min
● Requires up to 3 times more memory than original collection
● Requires 6 traversals of collections
We aim to reduce this to single traversal with no
additional memory.
Without you changing your code

ScalaBlitz

More Related Content

What's hot

Similar to ScalaBlitz

Recently uploaded

ScalaBlitz