ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collection.concurrent.TrieMaps arenew in 2.10)
Caution: Performance benefits visible only around several Thousand elements in the collection Machine ArchitectureDepends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
map, fold and filterscala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting itParallel collections can be converted back to sequential collections with seq
Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn example is List– it’s converted into a standard immutable parallelsequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitionof the collection in parallel, and re-“combining” all of the results that were completedin parallel. Side effecting operations Non Associative operations
scala> var sum =0 side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum += _); sumres7: Int = 452474scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres8: Int = 497761scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres9: Int = 422508
non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
out of order?Operations may be out of orderBUTRecombination of results would be in order C collection A A B C B A B C
performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;instead, its position in the tree defines the key with which it is associated.
conversions List is converted to vectorConverting parallel to sequential takes constant time
architecture splitters combinersSplit the collection into Is a Builder.Non-trivial partitions so Combines split lists together.That they can be accessedin sequence
brickbatsAbsence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.