Parallel Collections         with ScalaJul 6 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati
MotivationMultiple-coresPopular Parallel Programming remains a formidable challenge.                                 Impli...
scala> val list = (1 to 10000).toList                    scala> list.map(_ + 42)                    scala> list.par.map(_ ...
scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)scala> res0.map(println(_))12345res1: List[Unit] = List((), (),...
ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collectio...
Caution: Performance benefits visible only around several             Thousand elements in the collection             Mach...
map, fold and filterscala> val parArray = (1 to 1000000).toArray.par           scala> parArray.fold(0)(_+_)           res3...
creating a parallel collection    import scala.collection.parallel.immutable.ParVector                                    ...
Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn examp...
how does it work?     Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitiono...
scala> var sum =0      side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum...
non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary        ...
associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: s...
out of order?Operations may be out of orderBUTRecombination of results would be in order                                  ...
performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associa...
conversions                                                        List is converted to                                   ...
architecture      splitters                    combinersSplit the collection into    Is a Builder.Non-trivial partitions s...
brickbatsAbsence of configuration                           Not all algorithms are parallel friendly  unproven       Now, ...
Upcoming SlideShare
Loading in …5
×

Scala parallel-collections

1,702
-1

Published on

Parallel Collections in Scala. What to keep note of and watchouts.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,702
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
42
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Scala parallel-collections

  1. 1. Parallel Collections with ScalaJul 6 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati
  2. 2. MotivationMultiple-coresPopular Parallel Programming remains a formidable challenge. Implicit Parallelism
  3. 3. scala> val list = (1 to 10000).toList scala> list.map(_ + 42) scala> list.par.map(_ + 42)
  4. 4. scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)scala> res0.map(println(_))12345res1: List[Unit] = List((), (), (), (), ())scala> res0.par.map(println(_))31425res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),(), (), (), ())
  5. 5. ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collection.concurrent.TrieMaps arenew in 2.10)
  6. 6. Caution: Performance benefits visible only around several Thousand elements in the collection Machine ArchitectureDepends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
  7. 7. map, fold and filterscala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
  8. 8. creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting itParallel collections can be converted back to sequential collections with seq
  9. 9. Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn example is List– it’s converted into a standard immutable parallelsequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
  10. 10. how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitionof the collection in parallel, and re-“combining” all of the results that were completedin parallel. Side effecting operations Non Associative operations
  11. 11. scala> var sum =0 side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum += _); sumres7: Int = 452474scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres8: Int = 497761scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres9: Int = 422508
  12. 12. non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
  13. 13. associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
  14. 14. out of order?Operations may be out of orderBUTRecombination of results would be in order C collection A A B C B A B C
  15. 15. performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;instead, its position in the tree defines the key with which it is associated.
  16. 16. conversions List is converted to vectorConverting parallel to sequential takes constant time
  17. 17. architecture splitters combinersSplit the collection into Is a Builder.Non-trivial partitions so Combines split lists together.That they can be accessedin sequence
  18. 18. brickbatsAbsence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×