ScalaBlitz 
Efficient Collections Framework
What’s a Blitz?
Blitz-chess is a style of 
rapid chess play.
Blitz-chess is a style of 
rapid chess play.
Knights have horses.
Horses run fast.
def mean(xs: Array[Float]): Float = 
xs.par.reduce(_ + _) / xs.length
With Lists, operations 
can only be executed 
from left to right
1 2 4 8
1 2 4 8 
Not your typical list.
Bon app.
Apparently not enough
No amount of 
documentation is 
apparently enough
The reduceLeft 
guarantees operations are 
executed from left to right
Parallel and sequential 
collections sharing operations
There are several 
problems here
How we see users
How users 
see the docs
Bending the truth.
And sometimes we 
were just slow
So, we have a new API now 
def findDoe(names: Array[String]): Option[String] = 
{ 
names.toPar.find(_.endsWith(“Doe”)) 
}
Wait, you renamed a 
method? 
def findDoe(names: Array[String]): Option[String] = 
{ 
names.toPar.find(_.endsWith(“Doe”)) 
}
Yeah, par already exists. 
But, toPar is different. 
def findDoe(names: Array[String]): Option[String] = 
{ 
names.toPar.find(_.endsWith(“Doe”)) 
}
def findDoe(names: Array[String]): Option[String] = { 
names.toPar.find(_.endsWith(“Doe”)) 
} 
implicit class ParOps[Repr](val r: Repr) extends AnyVal { 
def toPar = new Par(r) 
}
def findDoe(names: Array[String]): Option[String] = { 
ParOps(names).toPar.find(_.endsWith(“Doe”)) 
} 
implicit class ParOps[Repr](val r: Repr) extends AnyVal { 
def toPar = new Par(r) 
}
def findDoe(names: Array[String]): Option[String] = { 
ParOps(names).toPar.find(_.endsWith(“Doe”)) 
} 
implicit class ParOps[Repr](val r: Repr) extends AnyVal { 
def toPar = new Par(r) 
} 
class Par[Repr](r: Repr)
def findDoe(names: Array[String]): Option[String] = { 
(new Par(names)).find(_.endsWith(“Doe”)) 
} 
implicit class ParOps[Repr](val r: Repr) extends AnyVal { 
def toPar = new Par(r) 
} 
class Par[Repr](r: Repr)
def findDoe(names: Array[String]): Option[String] = { 
(new Par(names)).find(_.endsWith(“Doe”)) 
} 
class Par[Repr](r: Repr) 
But, Par[Repr] does not 
have the find method!
def findDoe(names: Array[String]): Option[String] = { 
(new Par(names)).find(_.endsWith(“Doe”)) 
} 
class Par[Repr](r: Repr) 
True, but Par[Array[String]] 
does have a find method.
def findDoe(names: Array[String]): Option[String] = { 
(new Par(names)).find(_.endsWith(“Doe”)) 
} 
class Par[Repr](r: Repr) 
implicit class ParArrayOps[T](pa: Par[Array[T]]) { 
... 
def find(p: T => Boolean): Option[T] 
... 
}
More flexible!
More flexible! 
● does not have to implement methods that 
make no sense in parallel
More flexible! 
● does not have to implement methods that 
make no sense in parallel 
● slow conversions explicit
No standard library collections were 
hurt doing this.
More flexible! 
● does not have to implement methods that 
make no sense in parallel 
● slow conversions explicit 
● non-intrusive addition to standard library
More flexible! 
● does not have to implement methods that 
make no sense in parallel 
● slow conversions explicit 
● non-intrusive addition to standard library 
● easy to add new methods and collections
More flexible! 
● does not have to implement methods that 
make no sense in parallel 
● slow conversions explicit 
● non-intrusive addition to standard library 
● easy to add new methods and collections 
● import switches between implementations
def findDoe(names: Seq[String]): Option[String] = { 
names.toPar.find(_.endsWith(“Doe”)) 
}
def findDoe(names: Seq[String]): Option[String] = { 
names.toPar.find(_.endsWith(“Doe”)) 
}
def findDoe(names: Seq[String]): Option[String] = { 
names.toPar.find(_.endsWith(“Doe”)) 
} 
But how do I write generic code?
def findDoe[Repr[_]](names: Par[Repr[String]]) = { 
names.toPar.find(_.endsWith(“Doe”)) 
}
def findDoe[Repr[_]](names: Par[Repr[String]]) = { 
names.toPar.find(_.endsWith(“Doe”)) 
} 
Par[Repr[String]]does not 
have a find
def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = { 
names.toPar.find(_.endsWith(“Doe”)) 
}
def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = { 
names.toPar.find(_.endsWith(“Doe”)) 
} 
We don’t do this.
Make everything as simple as 
possible, but not simpler.
def findDoe(names: Reducable[String])= { 
names.find(_.endsWith(“Doe”)) 
}
def findDoe(names: Reducable[String])= { 
names.find(_.endsWith(“Doe”)) 
} 
findDoe(Array(1, 2, 3).toPar)
def findDoe(names: Reducable[String])= { 
names.find(_.endsWith(“Doe”)) 
} 
findDoe(toReducable(Array(1, 2, 3).toPar))
def findDoe(names: Reducable[String])= { 
names.find(_.endsWith(“Doe”)) 
} 
findDoe(toReducable(Array(1, 2, 3).toPar)) 
def arrayIsReducable[T]: IsReducable[T] = { … }
So let’s write a program!
import scala.collection.par._ 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
}
import scala.collection.par._ 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
val x = idx % wdt 
val y = idx / wdt 
}
import scala.collection.par._ 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
val x = idx % wdt 
val y = idx / wdt 
pixels(idx) = computeColor(x, y) 
}
import scala.collection.par._ 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
val x = idx % wdt 
val y = idx / wdt 
pixels(idx) = computeColor(x, y) 
} 
Scheduler not found!
import scala.collection.par._ 
import Scheduler.Implicits.global 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
val x = idx % wdt 
val y = idx / wdt 
pixels(idx) = computeColor(x, y) 
}
import scala.collection.par._ 
import Scheduler.Implicits.global 
val pixels = new Array[Int](wdt * hgt) 
for (idx <- (0 until (wdt * hgt)).toPar) { 
val x = idx % wdt 
val y = idx / wdt 
pixels(idx) = computeColor(x, y) 
}
New parallel collections 
33% faster! 
Now 
103 ms 
Previously 
148 ms
Workstealing tree scheduler 
rocks!
Workstealing tree scheduler 
rocks! 
But, are there other interesting 
workloads?
Fine-grained uniform 
workloads are on the opposite 
side of the spectrum.
def mean(xs: Array[Float]): Float = { 
val sum = xs.toPar.fold(0)(_ + _) 
sum / xs.length 
}
def mean(xs: Array[Float]): Float = { 
val sum = xs.toPar.fold(0)(_ + _) 
sum / xs.length 
} 
Now 
Previously 
15 ms 
565 ms
But how?
def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { 
var it = a.iterator 
var acc = z 
while (it.hasNext) { 
acc = op(acc, it.next) 
} 
acc 
}
def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { 
var it = a.iterator 
var acc = z 
while (it.hasNext) { 
acc = box(op(acc, it.next)) 
} 
acc 
}
def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { 
var it = a.iterator 
var acc = z 
while (it.hasNext) { 
acc = box(op(acc, it.next)) 
} 
acc 
} 
Generic methods cause boxing of primitives
def mean(xs: Array[Float]): Float = { 
val sum = xs.toPar.fold(0)(_ + _) 
sum / xs.length 
}
def mean(xs: Array[Float]): Float = { 
val sum = xs.toPar.fold(0)(_ + _) 
sum / xs.length 
} 
Generic methods hurt performance 
What can we do instead?
def mean(xs: Array[Float]): Float = { 
val sum = xs.toPar.fold(0)(_ + _) 
sum / xs.length 
} 
Generic methods hurt performance 
What can we do instead? 
Inline method body!
def mean(xs: Array[Float]): Float = { 
val sum = { 
var it = xs.iterator 
var acc = 0 
while (it.hasNext) { 
acc = acc + it.next 
} 
acc 
} 
sum / xs.length 
}
def mean(xs: Array[Float]): Float = { 
val sum = { 
var it = xs.iterator 
var acc = 0 
while (it.hasNext) { 
acc = acc + it.next 
} 
acc 
} 
sum / xs.length 
} 
Specific type 
No boxing! 
No memory allocation!
def mean(xs: Array[Float]): Float = { 
val sum = { 
var it = xs.iterator 
var acc = 0 
while (it.hasNext) { 
acc = acc + it.next 
} 
acc 
} 
sum / xs.length 
} 
Specific type 
No boxing! 
No memory allocation! 
565 ms → 281 ms 
2X speedup
def mean(xs: Array[Float]): Float = { 
val sum = { 
var it = xs.iterator 
var acc = 0 
while (it.hasNext) { 
acc = acc + it.next 
} 
acc 
} 
sum / xs.length 
}
def mean(xs: Array[Float]): Float = { 
val sum = { 
var it = xs.iterator 
var acc = 0 
while (it.hasNext) { 
acc = acc + it.next 
} 
acc 
} 
sum / xs.length 
} 
Iterators? For Array? 
We don’t need them!
def mean(xs: Array[Float]): Float = { 
val sum = { 
var i = 0 
val until = xs.size 
var acc = 0 
while (i < until) { 
acc = acc + a(i) 
i = i + 1 
} 
acc 
} 
sum / xs.length 
} 
Use index-based access!
def mean(xs: Array[Float]): Float = { 
val sum = { 
var i = 0 
val until = xs.size 
var acc = 0 
while (i < until) { 
acc = acc + a(i) 
i = i + 1 
} 
acc 
} 
sum / xs.length 
} 
Use index-based access! 
281 ms → 15 ms 
19x speedup
Are those optimizations parallel-collections specific?
Are those optimizations parallel-collections specific? 
No
Are those optimizations parallel-collections specific? 
No 
You can use them on sequential collections
def mean(xs: Array[Float]): Float = { 
val sum = xs.fold(0)(_ + _) 
sum / xs.length 
}
import scala.collections.optimizer._ 
def mean(xs: Array[Float]): Float = optimize{ 
val sum = xs.fold(0)(_ + _) 
sum / xs.length 
}
import scala.collections.optimizer._ 
def mean(xs: Array[Float]): Float = optimize{ 
val sum = xs.fold(0)(_ + _) 
sum / xs.length 
} 
You get 38 times speedup!
Future work
@specialized collections 
● Maps 
● Sets 
● Lists 
● Vectors 
Both faster & 
consuming less 
memory
@specialized collections 
● Maps 
● Sets 
● Lists 
● Vectors 
Both faster & 
consuming less 
memory 
Expect to get this for free inside 
optimize{} block
jdk8-style streams(parallel views) 
● Fast 
● Lightweight 
● Expressive API 
● Optimized 
Lazy data-parallel 
operations made 
easy
Future’s based asynchronous API 
val sum = future{ xs.sum } 
val normalized = sum.andThen(sum => sum/xs.size) 
Boilerplate code, ugly
Future’s based asynchronous API 
val sum = xs.toFuture.sum 
val scaled = xs.map(_ / sum) 
● Simple to use 
● Lightweight 
● Expressive API 
● Optimized 
Asynchronous data-parallel 
operations 
made easy
Current research: operation fusion 
val minMaleAge = people.filter(_.isMale) 
.map(_.age).min 
val minFemaleAge = people.filter(_.isFemale) 
.map(_.age).min
Current research: operation fusion 
val minMaleAge = people.filter(_.isMale) 
.map(_.age).min 
val minFemaleAge = people.filter(_.isFemale) 
.map(_.age).min 
● Requires up to 3 times more memory than original collection 
● Requires 6 traversals of collections
Current research: operation fusion 
val minMaleAge = people.filter(_.isMale) 
.map(_.age).min 
val minFemaleAge = people.filter(_.isFemale) 
.map(_.age).min 
● Requires up to 3 times more memory than original collection 
● Requires 6 traversals of collections 
We aim to reduce this to single traversal with no 
additional memory. 
Without you changing your code

ScalaBlitz