Scala collections
Sagie Davidovich
@mesagie
singularityworld.com
linkedin.com/in/sagied
Warm up example:
Fibonacci sequence
val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Key concepts:
• Recursive values
• Streams
• Scan
• Binary place-holder notation
Immutable collections
You’ll know about
• Avoid memory allocation for empty collections
• Optimize for small collections
• Equal-hashCode contract
• Asymptotic behavior of common operations
Nil
List.empty and Nil are singletons.
No new memory is allocated
Option[A]
Immutable Sets – emptySet
emptySet is a singleton too
Immutable Sets – Set1
Optimized for sets of size 1
Immutable Sets – Set2
Optimized for sets of size 2
Immutable Sets – Set4
A HashSet is (finally) instantiated
Immutable Collections
Mutable Collections
One liners
Computing a derivative
def derivative(nums: Iterable[Double]) =
nums.sliding(2)
.map (pair => pair._2 - pair._1)
What can be improved in this solution?
Bonus question: change a few characters to find the
max slope
Counting occurrences (histogram)
"encyclopedia" groupBy identity mapValues (_.size)
Map (
e -> 2, n -> 1, y -> 1, a -> 1, i -> 1,
l -> 1, p -> 1, c -> 2, o -> 1, d -> 1
)
Word n-grams
val range = 1 to 3
val text = "hello sweet world"
val tokenize = (s: String) => s.split(" ")
range flatMap (size => tokenize(text) sliding size)
Result:
Vector(Array(hello), Array(sweet), Array(world), Array(hello,
sweet), Array(sweet, world), Array(hello, sweet, world))
Are all members of a greater than
corresponding members of b
val a = List(2,3,4)
val b = List(1,2,3)
// O(n^2) and not very elegant.
(0 until a.size) forall (i => a(i) > b(i))
// O(n) but creates tuples and a temporary list. Yet, more elegant.
a zip b forall (x=> x._1 > x._2)
// same as above but doesn't create a temporary list (lazy)
a.view zip b forall (x=> x._1 > x._2)
// O(n), without tuple or temporary list creation, and even more elegant.
(a corresponds b)(_ > _)
Strings are collections. How come?
“abc”.max
@inline implicit def augmentString(x: String) =
new StringOps(x)
String <% StringOps <: StringLike <: IndexedSeqOptimized …
Complexity of collection operations
• Linear:
– Unary: O(n):
• Mappers: map, collect
• Reducers: reduce, foldLeft, foldRight
• Others: foreach, filter, indexOf, reverse, find, mkString
– Binary: O(n+ m):
• union, diff, and intersect
Immutable Collections time complexity
head tail apply update prepend append
List C C L L C L
Stream C C L L C L
Vector eC eC eC eC eC eC
Stack C C L L C L
Queue aC aC L L L C
Range C C C - - -
String C L C L L L
Mutable Collections time complexity
head tail apply update prepend append insert
ArrayBuffer C L C C L aC L
ListBuffer C L L L C C L
StringBuilde
r C L C C L aC L
MutableList C L L L C C L
Queue C L L L C C L
ArraySeq C L C C - - -
Stack C L L L C L L
ArrayStack C L C C aC L L
Array C L C C - - -
Bonus question
What’s the complexity
of Range.sum?
Range
Equals-hashCode contract
(a equals b)  (a.hashCode == b.hashCode)
All Scala collection implement the contract
Bad idea: Set[Array[Int]]
Good idea: Set[Vector[Int]]
Bad Idea: Set[ArrayBuffer[Int]]
Bad Idea: Set[collection.mutable._]
Good Idea: Set[collection.immutable._]
More on collections equality
val (a, b) = (1 to 3, List(1, 2, 3))
a == b // true
Q: Wait, how efficient is Range.hashCode?
A: O(n)
override def hashCode = util.hashing.MurmurHash3.seqHash(seq)
Challenge yourself:
Is there a closed (o(1)) formula for a range hashCode?
Java interoperability
Implicit (less boilerplate):
import collection.javaConversions._
javaCollection.filter(…)
Explicit (better control):
Import collection.javaConverters._
javaCollection.asScala.filter(…)
scalaCollection.asJava
The power of type-level programming
graph path-finding in compile time
import scala.language.implicitConversions
// Vertices
case class A(l: List[Char])
case class B(l: List[Char])
case class C(l: List[Char])
case class D(l: List[Char])
case class E(l: List[Char])
// Edges
implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A')
implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B')
implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C')
implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E')
def pathFrom(end:D) = end
pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
Want to go Pro?
• Shapeless (Miles Sabin)
– Polytypic programming & Heterogenous lists
– github.com/milessabin/shapeless
• Scalaxy (Olivier Chafik)
– Macros for boosting performance of collections
– github.com/ochafik/Scalaxy

Scala collections wizardry - Scalapeño

  • 1.
  • 2.
    Warm up example: Fibonaccisequence val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _) Key concepts: • Recursive values • Streams • Scan • Binary place-holder notation
  • 3.
    Immutable collections You’ll knowabout • Avoid memory allocation for empty collections • Optimize for small collections • Equal-hashCode contract • Asymptotic behavior of common operations
  • 4.
    Nil List.empty and Nilare singletons. No new memory is allocated
  • 5.
  • 6.
    Immutable Sets –emptySet emptySet is a singleton too
  • 7.
    Immutable Sets –Set1 Optimized for sets of size 1
  • 8.
    Immutable Sets –Set2 Optimized for sets of size 2
  • 9.
    Immutable Sets –Set4 A HashSet is (finally) instantiated
  • 10.
  • 11.
  • 12.
  • 13.
    Computing a derivative defderivative(nums: Iterable[Double]) = nums.sliding(2) .map (pair => pair._2 - pair._1) What can be improved in this solution? Bonus question: change a few characters to find the max slope
  • 14.
    Counting occurrences (histogram) "encyclopedia"groupBy identity mapValues (_.size) Map ( e -> 2, n -> 1, y -> 1, a -> 1, i -> 1, l -> 1, p -> 1, c -> 2, o -> 1, d -> 1 )
  • 15.
    Word n-grams val range= 1 to 3 val text = "hello sweet world" val tokenize = (s: String) => s.split(" ") range flatMap (size => tokenize(text) sliding size) Result: Vector(Array(hello), Array(sweet), Array(world), Array(hello, sweet), Array(sweet, world), Array(hello, sweet, world))
  • 16.
    Are all membersof a greater than corresponding members of b val a = List(2,3,4) val b = List(1,2,3) // O(n^2) and not very elegant. (0 until a.size) forall (i => a(i) > b(i)) // O(n) but creates tuples and a temporary list. Yet, more elegant. a zip b forall (x=> x._1 > x._2) // same as above but doesn't create a temporary list (lazy) a.view zip b forall (x=> x._1 > x._2) // O(n), without tuple or temporary list creation, and even more elegant. (a corresponds b)(_ > _)
  • 17.
    Strings are collections.How come? “abc”.max @inline implicit def augmentString(x: String) = new StringOps(x) String <% StringOps <: StringLike <: IndexedSeqOptimized …
  • 18.
    Complexity of collectionoperations • Linear: – Unary: O(n): • Mappers: map, collect • Reducers: reduce, foldLeft, foldRight • Others: foreach, filter, indexOf, reverse, find, mkString – Binary: O(n+ m): • union, diff, and intersect
  • 19.
    Immutable Collections timecomplexity head tail apply update prepend append List C C L L C L Stream C C L L C L Vector eC eC eC eC eC eC Stack C C L L C L Queue aC aC L L L C Range C C C - - - String C L C L L L
  • 20.
    Mutable Collections timecomplexity head tail apply update prepend append insert ArrayBuffer C L C C L aC L ListBuffer C L L L C C L StringBuilde r C L C C L aC L MutableList C L L L C C L Queue C L L L C C L ArraySeq C L C C - - - Stack C L L L C L L ArrayStack C L C C aC L L Array C L C C - - -
  • 21.
    Bonus question What’s thecomplexity of Range.sum?
  • 22.
  • 23.
    Equals-hashCode contract (a equalsb)  (a.hashCode == b.hashCode) All Scala collection implement the contract Bad idea: Set[Array[Int]] Good idea: Set[Vector[Int]] Bad Idea: Set[ArrayBuffer[Int]] Bad Idea: Set[collection.mutable._] Good Idea: Set[collection.immutable._]
  • 24.
    More on collectionsequality val (a, b) = (1 to 3, List(1, 2, 3)) a == b // true Q: Wait, how efficient is Range.hashCode? A: O(n) override def hashCode = util.hashing.MurmurHash3.seqHash(seq) Challenge yourself: Is there a closed (o(1)) formula for a range hashCode?
  • 25.
    Java interoperability Implicit (lessboilerplate): import collection.javaConversions._ javaCollection.filter(…) Explicit (better control): Import collection.javaConverters._ javaCollection.asScala.filter(…) scalaCollection.asJava
  • 26.
    The power oftype-level programming graph path-finding in compile time import scala.language.implicitConversions // Vertices case class A(l: List[Char]) case class B(l: List[Char]) case class C(l: List[Char]) case class D(l: List[Char]) case class E(l: List[Char]) // Edges implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A') implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B') implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C') implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E') def pathFrom(end:D) = end pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
  • 27.
    Want to goPro? • Shapeless (Miles Sabin) – Polytypic programming & Heterogenous lists – github.com/milessabin/shapeless • Scalaxy (Olivier Chafik) – Macros for boosting performance of collections – github.com/ochafik/Scalaxy