• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scala parallel-collections

Scala parallel-collections



Parallel Collections in Scala. What to keep note of and watchouts.

Parallel Collections in Scala. What to keep note of and watchouts.



Total Views
Views on SlideShare
Embed Views



2 Embeds 666

http://blog.knoldus.com 665
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Scala parallel-collections Scala parallel-collections Presentation Transcript

    • Parallel Collections with ScalaJul 6 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati
    • MotivationMultiple-coresPopular Parallel Programming remains a formidable challenge. Implicit Parallelism
    • scala> val list = (1 to 10000).toList scala> list.map(_ + 42) scala> list.par.map(_ + 42)
    • scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)scala> res0.map(println(_))12345res1: List[Unit] = List((), (), (), (), ())scala> res0.par.map(println(_))31425res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),(), (), (), ())
    • ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collection.concurrent.TrieMaps arenew in 2.10)
    • Caution: Performance benefits visible only around several Thousand elements in the collection Machine ArchitectureDepends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
    • map, fold and filterscala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
    • creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting itParallel collections can be converted back to sequential collections with seq
    • Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn example is List– it’s converted into a standard immutable parallelsequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
    • how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitionof the collection in parallel, and re-“combining” all of the results that were completedin parallel. Side effecting operations Non Associative operations
    • scala> var sum =0 side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum += _); sumres7: Int = 452474scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres8: Int = 497761scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres9: Int = 422508
    • non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
    • associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
    • out of order?Operations may be out of orderBUTRecombination of results would be in order C collection A A B C B A B C
    • performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;instead, its position in the tree defines the key with which it is associated.
    • conversions List is converted to vectorConverting parallel to sequential takes constant time
    • architecture splitters combinersSplit the collection into Is a Builder.Non-trivial partitions so Combines split lists together.That they can be accessedin sequence
    • brickbatsAbsence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.