• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scala parallel-collections
 

Scala parallel-collections

on

  • 2,079 views

Parallel Collections in Scala. What to keep note of and watchouts.

Parallel Collections in Scala. What to keep note of and watchouts.

Statistics

Views

Total Views
2,079
Views on SlideShare
1,413
Embed Views
666

Actions

Likes
1
Downloads
39
Comments
0

2 Embeds 666

http://blog.knoldus.com 665
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Scala parallel-collections Scala parallel-collections Presentation Transcript

    • Parallel Collections with ScalaJul 6 2012 > Vikas Hazrati > vikas@knoldus.com > @vhazrati
    • MotivationMultiple-coresPopular Parallel Programming remains a formidable challenge. Implicit Parallelism
    • scala> val list = (1 to 10000).toList scala> list.map(_ + 42) scala> list.par.map(_ + 42)
    • scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)scala> res0.map(println(_))12345res1: List[Unit] = List((), (), (), (), ())scala> res0.par.map(println(_))31425res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((),(), (), (), ())
    • ParArrayParVectormutable.ParHashMapmutable.ParHashSetimmutable.ParHashMapimmutable.ParHashSetParRangeParTrieMap (collection.concurrent.TrieMaps arenew in 2.10)
    • Caution: Performance benefits visible only around several Thousand elements in the collection Machine ArchitectureDepends on JVM vendor and version Per element workload Specific collection – ParArray, ParTrieMap Specific operation – transformer(filter), accessor (foreach) Memory Management
    • map, fold and filterscala> val parArray = (1 to 1000000).toArray.par scala> parArray.fold(0)(_+_) res3: Int = 1784293664 scala> val narArray = (1 to 1000000).toArray scala> narArray.fold(0)(_+_) I did not notice res5: Int = 1784293664 Difference on my laptop scala> parArray.fold(0)(_+_) res6: Int = 1784293664
    • creating a parallel collection import scala.collection.parallel.immutable.ParVector With a new val pv = new ParVector[Int] val pv = Vector(1,2,3,4,5,6,7,8,9).par Taking a sequential collection And converting itParallel collections can be converted back to sequential collections with seq
    • Collections are inherently sequentialThey are converted to || by copying elements into similar parallel collectionAn example is List– it’s converted into a standard immutable parallelsequence, which is a ParVector. Overhead! Array, Vector, HashMap do not have this overhead
    • how does it work? Map reduce ? by recursively “splitting” a given collection, applying an operation on each partitionof the collection in parallel, and re-“combining” all of the results that were completedin parallel. Side effecting operations Non Associative operations
    • scala> var sum =0 side effecting operationsum: Int = 0scala> val list = (1 to 1000).toList.parscala> list.foreach(sum += _); sumres7: Int = 452474scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres8: Int = 497761scala> var sum =0sum: Int = 0scala> list.foreach(sum += _); sumres9: Int = 422508
    • non-associative operationsThe order in which function is applied to the elements of the collection canbe arbitrary scala> val list = (1 to 1000).toList.par scala> list.reduce(_-_) res01: Int = -228888 scala> list.reduce(_-_) res02: Int = -61000 scala> list.reduce(_-_) res03: Int = -331818
    • associate but non-commutativescala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
    • out of order?Operations may be out of orderBUTRecombination of results would be in order C collection A A B C B A B C
    • performanceIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;instead, its position in the tree defines the key with which it is associated.
    • conversions List is converted to vectorConverting parallel to sequential takes constant time
    • architecture splitters combinersSplit the collection into Is a Builder.Non-trivial partitions so Combines split lists together.That they can be accessedin sequence
    • brickbatsAbsence of configuration Not all algorithms are parallel friendly unproven Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.