Scala collections wizardry - Scalapeño

2,961 views
2,814 views

Published on

Published in: Technology, Education

Scala collections wizardry - Scalapeño

  1. 1. Scala collections Sagie Davidovich @mesagie singularityworld.com linkedin.com/in/sagied
  2. 2. Warm up example: Fibonacci sequence val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _) Key concepts: • Recursive values • Streams • Scan • Binary place-holder notation
  3. 3. Immutable collections You’ll know about • Avoid memory allocation for empty collections • Optimize for small collections • Equal-hashCode contract • Asymptotic behavior of common operations
  4. 4. Nil List.empty and Nil are singletons. No new memory is allocated
  5. 5. Option[A]
  6. 6. Immutable Sets – emptySet emptySet is a singleton too
  7. 7. Immutable Sets – Set1 Optimized for sets of size 1
  8. 8. Immutable Sets – Set2 Optimized for sets of size 2
  9. 9. Immutable Sets – Set4 A HashSet is (finally) instantiated
  10. 10. Immutable Collections
  11. 11. Mutable Collections
  12. 12. One liners
  13. 13. Computing a derivative def derivative(nums: Iterable[Double]) = nums.sliding(2) .map (pair => pair._2 - pair._1) What can be improved in this solution? Bonus question: change a few characters to find the max slope
  14. 14. Counting occurrences (histogram) "encyclopedia" groupBy identity mapValues (_.size) Map ( e -> 2, n -> 1, y -> 1, a -> 1, i -> 1, l -> 1, p -> 1, c -> 2, o -> 1, d -> 1 )
  15. 15. Word n-grams val range = 1 to 3 val text = "hello sweet world" val tokenize = (s: String) => s.split(" ") range flatMap (size => tokenize(text) sliding size) Result: Vector(Array(hello), Array(sweet), Array(world), Array(hello, sweet), Array(sweet, world), Array(hello, sweet, world))
  16. 16. Are all members of a greater than corresponding members of b val a = List(2,3,4) val b = List(1,2,3) // O(n^2) and not very elegant. (0 until a.size) forall (i => a(i) > b(i)) // O(n) but creates tuples and a temporary list. Yet, more elegant. a zip b forall (x=> x._1 > x._2) // same as above but doesn't create a temporary list (lazy) a.view zip b forall (x=> x._1 > x._2) // O(n), without tuple or temporary list creation, and even more elegant. (a corresponds b)(_ > _)
  17. 17. Strings are collections. How come? “abc”.max @inline implicit def augmentString(x: String) = new StringOps(x) String <% StringOps <: StringLike <: IndexedSeqOptimized …
  18. 18. Complexity of collection operations • Linear: – Unary: O(n): • Mappers: map, collect • Reducers: reduce, foldLeft, foldRight • Others: foreach, filter, indexOf, reverse, find, mkString – Binary: O(n+ m): • union, diff, and intersect
  19. 19. Immutable Collections time complexity head tail apply update prepend append List C C L L C L Stream C C L L C L Vector eC eC eC eC eC eC Stack C C L L C L Queue aC aC L L L C Range C C C - - - String C L C L L L
  20. 20. Mutable Collections time complexity head tail apply update prepend append insert ArrayBuffer C L C C L aC L ListBuffer C L L L C C L StringBuilde r C L C C L aC L MutableList C L L L C C L Queue C L L L C C L ArraySeq C L C C - - - Stack C L L L C L L ArrayStack C L C C aC L L Array C L C C - - -
  21. 21. Bonus question What’s the complexity of Range.sum?
  22. 22. Range
  23. 23. Equals-hashCode contract (a equals b)  (a.hashCode == b.hashCode) All Scala collection implement the contract Bad idea: Set[Array[Int]] Good idea: Set[Vector[Int]] Bad Idea: Set[ArrayBuffer[Int]] Bad Idea: Set[collection.mutable._] Good Idea: Set[collection.immutable._]
  24. 24. More on collections equality val (a, b) = (1 to 3, List(1, 2, 3)) a == b // true Q: Wait, how efficient is Range.hashCode? A: O(n) override def hashCode = util.hashing.MurmurHash3.seqHash(seq) Challenge yourself: Is there a closed (o(1)) formula for a range hashCode?
  25. 25. Java interoperability Implicit (less boilerplate): import collection.javaConversions._ javaCollection.filter(…) Explicit (better control): Import collection.javaConverters._ javaCollection.asScala.filter(…) scalaCollection.asJava
  26. 26. The power of type-level programming graph path-finding in compile time import scala.language.implicitConversions // Vertices case class A(l: List[Char]) case class B(l: List[Char]) case class C(l: List[Char]) case class D(l: List[Char]) case class E(l: List[Char]) // Edges implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A') implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B') implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C') implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E') def pathFrom(end:D) = end pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
  27. 27. Want to go Pro? • Shapeless (Miles Sabin) – Polytypic programming & Heterogenous lists – github.com/milessabin/shapeless • Scalaxy (Olivier Chafik) – Macros for boosting performance of collections – github.com/ochafik/Scalaxy

×