Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ScalaBlitz

1,667 views

Published on

Scala eXchange 2013 presentation of the ScalaBlitz - a highly optimized data-parallel collection framework for Scala.

Published in: Software
  • Be the first to comment

  • Be the first to like this

ScalaBlitz

  1. 1. ScalaBlitz Efficient Collections Framework
  2. 2. What’s a Blitz?
  3. 3. Blitz-chess is a style of rapid chess play.
  4. 4. Blitz-chess is a style of rapid chess play.
  5. 5. Knights have horses.
  6. 6. Horses run fast.
  7. 7. def mean(xs: Array[Float]): Float = xs.par.reduce(_ + _) / xs.length
  8. 8. With Lists, operations can only be executed from left to right
  9. 9. 1 2 4 8
  10. 10. 1 2 4 8 Not your typical list.
  11. 11. Bon app.
  12. 12. Apparently not enough
  13. 13. No amount of documentation is apparently enough
  14. 14. The reduceLeft guarantees operations are executed from left to right
  15. 15. Parallel and sequential collections sharing operations
  16. 16. There are several problems here
  17. 17. How we see users
  18. 18. How users see the docs
  19. 19. Bending the truth.
  20. 20. And sometimes we were just slow
  21. 21. So, we have a new API now def findDoe(names: Array[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) }
  22. 22. Wait, you renamed a method? def findDoe(names: Array[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) }
  23. 23. Yeah, par already exists. But, toPar is different. def findDoe(names: Array[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) }
  24. 24. def findDoe(names: Array[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) } implicit class ParOps[Repr](val r: Repr) extends AnyVal { def toPar = new Par(r) }
  25. 25. def findDoe(names: Array[String]): Option[String] = { ParOps(names).toPar.find(_.endsWith(“Doe”)) } implicit class ParOps[Repr](val r: Repr) extends AnyVal { def toPar = new Par(r) }
  26. 26. def findDoe(names: Array[String]): Option[String] = { ParOps(names).toPar.find(_.endsWith(“Doe”)) } implicit class ParOps[Repr](val r: Repr) extends AnyVal { def toPar = new Par(r) } class Par[Repr](r: Repr)
  27. 27. def findDoe(names: Array[String]): Option[String] = { (new Par(names)).find(_.endsWith(“Doe”)) } implicit class ParOps[Repr](val r: Repr) extends AnyVal { def toPar = new Par(r) } class Par[Repr](r: Repr)
  28. 28. def findDoe(names: Array[String]): Option[String] = { (new Par(names)).find(_.endsWith(“Doe”)) } class Par[Repr](r: Repr) But, Par[Repr] does not have the find method!
  29. 29. def findDoe(names: Array[String]): Option[String] = { (new Par(names)).find(_.endsWith(“Doe”)) } class Par[Repr](r: Repr) True, but Par[Array[String]] does have a find method.
  30. 30. def findDoe(names: Array[String]): Option[String] = { (new Par(names)).find(_.endsWith(“Doe”)) } class Par[Repr](r: Repr) implicit class ParArrayOps[T](pa: Par[Array[T]]) { ... def find(p: T => Boolean): Option[T] ... }
  31. 31. More flexible!
  32. 32. More flexible! ● does not have to implement methods that make no sense in parallel
  33. 33. More flexible! ● does not have to implement methods that make no sense in parallel ● slow conversions explicit
  34. 34. No standard library collections were hurt doing this.
  35. 35. More flexible! ● does not have to implement methods that make no sense in parallel ● slow conversions explicit ● non-intrusive addition to standard library
  36. 36. More flexible! ● does not have to implement methods that make no sense in parallel ● slow conversions explicit ● non-intrusive addition to standard library ● easy to add new methods and collections
  37. 37. More flexible! ● does not have to implement methods that make no sense in parallel ● slow conversions explicit ● non-intrusive addition to standard library ● easy to add new methods and collections ● import switches between implementations
  38. 38. def findDoe(names: Seq[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) }
  39. 39. def findDoe(names: Seq[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) }
  40. 40. def findDoe(names: Seq[String]): Option[String] = { names.toPar.find(_.endsWith(“Doe”)) } But how do I write generic code?
  41. 41. def findDoe[Repr[_]](names: Par[Repr[String]]) = { names.toPar.find(_.endsWith(“Doe”)) }
  42. 42. def findDoe[Repr[_]](names: Par[Repr[String]]) = { names.toPar.find(_.endsWith(“Doe”)) } Par[Repr[String]]does not have a find
  43. 43. def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = { names.toPar.find(_.endsWith(“Doe”)) }
  44. 44. def findDoe[Repr[_]: Ops](names: Par[Repr[String]]) = { names.toPar.find(_.endsWith(“Doe”)) } We don’t do this.
  45. 45. Make everything as simple as possible, but not simpler.
  46. 46. def findDoe(names: Reducable[String])= { names.find(_.endsWith(“Doe”)) }
  47. 47. def findDoe(names: Reducable[String])= { names.find(_.endsWith(“Doe”)) } findDoe(Array(1, 2, 3).toPar)
  48. 48. def findDoe(names: Reducable[String])= { names.find(_.endsWith(“Doe”)) } findDoe(toReducable(Array(1, 2, 3).toPar))
  49. 49. def findDoe(names: Reducable[String])= { names.find(_.endsWith(“Doe”)) } findDoe(toReducable(Array(1, 2, 3).toPar)) def arrayIsReducable[T]: IsReducable[T] = { … }
  50. 50. So let’s write a program!
  51. 51. import scala.collection.par._ val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { }
  52. 52. import scala.collection.par._ val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { val x = idx % wdt val y = idx / wdt }
  53. 53. import scala.collection.par._ val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { val x = idx % wdt val y = idx / wdt pixels(idx) = computeColor(x, y) }
  54. 54. import scala.collection.par._ val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { val x = idx % wdt val y = idx / wdt pixels(idx) = computeColor(x, y) } Scheduler not found!
  55. 55. import scala.collection.par._ import Scheduler.Implicits.global val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { val x = idx % wdt val y = idx / wdt pixels(idx) = computeColor(x, y) }
  56. 56. import scala.collection.par._ import Scheduler.Implicits.global val pixels = new Array[Int](wdt * hgt) for (idx <- (0 until (wdt * hgt)).toPar) { val x = idx % wdt val y = idx / wdt pixels(idx) = computeColor(x, y) }
  57. 57. New parallel collections 33% faster! Now 103 ms Previously 148 ms
  58. 58. Workstealing tree scheduler rocks!
  59. 59. Workstealing tree scheduler rocks! But, are there other interesting workloads?
  60. 60. Fine-grained uniform workloads are on the opposite side of the spectrum.
  61. 61. def mean(xs: Array[Float]): Float = { val sum = xs.toPar.fold(0)(_ + _) sum / xs.length }
  62. 62. def mean(xs: Array[Float]): Float = { val sum = xs.toPar.fold(0)(_ + _) sum / xs.length } Now Previously 15 ms 565 ms
  63. 63. But how?
  64. 64. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { var it = a.iterator var acc = z while (it.hasNext) { acc = op(acc, it.next) } acc }
  65. 65. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { var it = a.iterator var acc = z while (it.hasNext) { acc = box(op(acc, it.next)) } acc }
  66. 66. def fold[T](a: Iterable[T])(z:T)(op: (T, T) => T) = { var it = a.iterator var acc = z while (it.hasNext) { acc = box(op(acc, it.next)) } acc } Generic methods cause boxing of primitives
  67. 67. def mean(xs: Array[Float]): Float = { val sum = xs.toPar.fold(0)(_ + _) sum / xs.length }
  68. 68. def mean(xs: Array[Float]): Float = { val sum = xs.toPar.fold(0)(_ + _) sum / xs.length } Generic methods hurt performance What can we do instead?
  69. 69. def mean(xs: Array[Float]): Float = { val sum = xs.toPar.fold(0)(_ + _) sum / xs.length } Generic methods hurt performance What can we do instead? Inline method body!
  70. 70. def mean(xs: Array[Float]): Float = { val sum = { var it = xs.iterator var acc = 0 while (it.hasNext) { acc = acc + it.next } acc } sum / xs.length }
  71. 71. def mean(xs: Array[Float]): Float = { val sum = { var it = xs.iterator var acc = 0 while (it.hasNext) { acc = acc + it.next } acc } sum / xs.length } Specific type No boxing! No memory allocation!
  72. 72. def mean(xs: Array[Float]): Float = { val sum = { var it = xs.iterator var acc = 0 while (it.hasNext) { acc = acc + it.next } acc } sum / xs.length } Specific type No boxing! No memory allocation! 565 ms → 281 ms 2X speedup
  73. 73. def mean(xs: Array[Float]): Float = { val sum = { var it = xs.iterator var acc = 0 while (it.hasNext) { acc = acc + it.next } acc } sum / xs.length }
  74. 74. def mean(xs: Array[Float]): Float = { val sum = { var it = xs.iterator var acc = 0 while (it.hasNext) { acc = acc + it.next } acc } sum / xs.length } Iterators? For Array? We don’t need them!
  75. 75. def mean(xs: Array[Float]): Float = { val sum = { var i = 0 val until = xs.size var acc = 0 while (i < until) { acc = acc + a(i) i = i + 1 } acc } sum / xs.length } Use index-based access!
  76. 76. def mean(xs: Array[Float]): Float = { val sum = { var i = 0 val until = xs.size var acc = 0 while (i < until) { acc = acc + a(i) i = i + 1 } acc } sum / xs.length } Use index-based access! 281 ms → 15 ms 19x speedup
  77. 77. Are those optimizations parallel-collections specific?
  78. 78. Are those optimizations parallel-collections specific? No
  79. 79. Are those optimizations parallel-collections specific? No You can use them on sequential collections
  80. 80. def mean(xs: Array[Float]): Float = { val sum = xs.fold(0)(_ + _) sum / xs.length }
  81. 81. import scala.collections.optimizer._ def mean(xs: Array[Float]): Float = optimize{ val sum = xs.fold(0)(_ + _) sum / xs.length }
  82. 82. import scala.collections.optimizer._ def mean(xs: Array[Float]): Float = optimize{ val sum = xs.fold(0)(_ + _) sum / xs.length } You get 38 times speedup!
  83. 83. Future work
  84. 84. @specialized collections ● Maps ● Sets ● Lists ● Vectors Both faster & consuming less memory
  85. 85. @specialized collections ● Maps ● Sets ● Lists ● Vectors Both faster & consuming less memory Expect to get this for free inside optimize{} block
  86. 86. jdk8-style streams(parallel views) ● Fast ● Lightweight ● Expressive API ● Optimized Lazy data-parallel operations made easy
  87. 87. Future’s based asynchronous API val sum = future{ xs.sum } val normalized = sum.andThen(sum => sum/xs.size) Boilerplate code, ugly
  88. 88. Future’s based asynchronous API val sum = xs.toFuture.sum val scaled = xs.map(_ / sum) ● Simple to use ● Lightweight ● Expressive API ● Optimized Asynchronous data-parallel operations made easy
  89. 89. Current research: operation fusion val minMaleAge = people.filter(_.isMale) .map(_.age).min val minFemaleAge = people.filter(_.isFemale) .map(_.age).min
  90. 90. Current research: operation fusion val minMaleAge = people.filter(_.isMale) .map(_.age).min val minFemaleAge = people.filter(_.isFemale) .map(_.age).min ● Requires up to 3 times more memory than original collection ● Requires 6 traversals of collections
  91. 91. Current research: operation fusion val minMaleAge = people.filter(_.isMale) .map(_.age).min val minFemaleAge = people.filter(_.isFemale) .map(_.age).min ● Requires up to 3 times more memory than original collection ● Requires 6 traversals of collections We aim to reduce this to single traversal with no additional memory. Without you changing your code

×