Scala parallel-collections


Published on

Parallel Collections in Scala. What to keep note of and watchouts.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scala parallel-collections

  1. 1. Parallel Collections with ScalaJul 6 2012 > Vikas Hazrati > > @vhazrati
  2. 2. MotivationMultiple-coresPopular Parallel Programming remains a formidable challenge. Implicit Parallelism
  3. 3. scala> val list = (1 to 10000).toList scala> + 42) scala> + 42)
  4. 4. scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)scala> List[Unit] = List((), (), (), (), ())scala> scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),(), (), (), ())
  5. 5. ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collection.concurrent.TrieMaps arenew in 2.10)
  6. 6. Caution: Performance benefits visible only around several Thousand elements in the collection Machine ArchitectureDepends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
  7. 7. map, fold and filterscala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
  8. 8. creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting itParallel collections can be converted back to sequential collections with seq
  9. 9. Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn example is List– it’s converted into a standard immutable parallelsequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
  10. 10. how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitionof the collection in parallel, and re-“combining” all of the results that were completedin parallel. Side effecting operations Non Associative operations
  11. 11. scala> var sum =0 side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum += _); sumres7: Int = 452474scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres8: Int = 497761scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres9: Int = 422508
  12. 12. non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
  13. 13. associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
  14. 14. out of order?Operations may be out of orderBUTRecombination of results would be in order C collection A A B C B A B C
  15. 15. performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;instead, its position in the tree defines the key with which it is associated.
  16. 16. conversions List is converted to vectorConverting parallel to sequential takes constant time
  17. 17. architecture splitters combinersSplit the collection into Is a Builder.Non-trivial partitions so Combines split lists together.That they can be accessedin sequence
  18. 18. brickbatsAbsence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.