SlideShare a Scribd company logo
1 of 28
Download to read offline
Fusing Transformations of Strict
Scala Collections with Views
learn about it through the work of…
Harold
Abelson
Gerald Jay
Sussman
Lex
Spoon
Bill
Venners
Runar
Bjarnason
Paul
Chiusano
Michael
Pilquist
Li Haoyi Frank
Sommers
Sergei
Winitzki
Alvin
Alexander
Martin
Odersky
@philip_schwarz
slides by http://fpilluminated.com/
This deck was inspired by the following tweet.
@philip_schwarz
With excerpts from some great books, this deck aims to provide a
good introduction to how the transformations of eager collections,
e.g. map and filter, can be fused using views.
3
There is a lot of overlap between the subject of views, and that of streams. For example, they
both address the problem of fusing transformations, and they both do it using laziness.
Because of that, I think it can be useful, before diving into the subject of views, to first go
through a brief refresher on (or introduction to) streams.
If you prefer to go straight to the main subject of the deck then you can just jump to slide 13.
4
Among the many transformer methods of collections (methods
that create new collections), map and filter stand out as
members of the following triad that is the bread, butter and jam
of functional programming.
map
λ
(define (product-of-squares-of-odd-elements sequence)
(accumulate *
1
(map square
(filter odd? sequence))))
(defn product-of-squares-of-odd-elements [sequence]
(accumulate *
1
(map square
(filter odd? sequence))))
def product_of_squares_of_odd_elements(sequence: List[Int]): Int =
sequence.filter(isOdd)
.map(square)
.foldRight(1)(_*_)
product_of_squares_of_odd_elements :: [Int] -> Int
product_of_squares_of_odd_elements sequence =
foldr (*)
1
(map square
(filter is_odd
sequence))
product_of_squares_of_odd_elements : [Nat] -> Nat
product_of_squares_of_odd_elements sequence =
foldRight (*)
1
(map square
(filter is_odd
sequence)) 5
I first came across the map, filter and fold triad many years ago, at university, while
reading Structure and Interpretation of Computer Programs (SICP).
The following four slides consist of SICP excerpts introducing the following two ideas:
1) Transformations of strict/eager lists are severely inefficient, because they
require creation and copying of data structures (that may be huge) at every step
of a process (sequence of transformations).
2) Streams are a clever idea that exploits delayed evaluation (non-
strictness/laziness) to permit the use of transformations without incurring
some of the costs of manipulating strict/eager lists.
6
3.5 Streams
...
From an abstract point of view, a stream is simply a sequence.
However, we will find that the straightforward implementation of streams as lists (as in section 2.2.1) doesn’t fully reveal the power
of stream processing.
As an alternative, we introduce the technique of delayed evaluation, which enables us to represent very large (even infinite)
sequences as streams. Stream processing lets us model systems that have state without ever using assignment or mutable data.
Structure and
Interpretation
of Computer Programs
2.2.1 Representing Sequences
Figure 2.4: The sequence 1,2,3,4 represented as a chain of pairs.
One of the useful structures we can build with pairs is a sequence -- an ordered collection of data objects. There are, of course, many ways to
represent sequences in terms of pairs. One particularly straightforward representation is illustrated in figure 2.4, where the sequence 1, 2, 3, 4 is
represented as a chain of pairs. The car of each pair is the corresponding item in the chain, and the cdr of the pair is the next pair in the chain.
The cdr of the final pair signals the end of the sequence by pointing to a distinguished value that is not a pair, represented in box-and-pointer
diagrams as a diagonal line and in programs as the value of the variable nil. The entire sequence is constructed by nested cons operations:
(cons 1
(cons 2
(cons 3
(cons 4 nil))))
7
3.5.1 Streams Are Delayed Lists
As we saw in section 2.2.3, sequences can serve as standard interfaces for combining program modules. We formulated powerful
abstractions for manipulating sequences, such as map, filter, and accumulate, that capture a wide variety of operations in a manner
that is both succinct and elegant.
Unfortunately, if we represent sequences as lists, this elegance is bought at the price of severe inefficiency with respect to both the
time and space required by our computations. When we represent manipulations on sequences as transformations of lists, our
programs must construct and copy data structures (which may be huge) at every step of a process.
Structure and
Interpretation
of Computer Programs
(define (map proc items)
(if (null? items)
nil
(cons (proc (car items))
(map proc (cdr items)))))
(define (filter predicate sequence)
(cond ((null? sequence) nil)
((predicate (car sequence))
(cons (car sequence)
(filter predicate (cdr sequence))))
(else (filter predicate (cdr sequence)))))
(define (accumulate op initial sequence)
(if (null? sequence)
initial
(op (car sequence)
(accumulate op initial (cdr sequence)))))
𝒇𝒐𝒍𝒅𝒓 ∷ 𝛼 → 𝛽 → 𝛽 → 𝛽 → 𝛼 → 𝛽
𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 = 𝑒
𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 𝑥: 𝑥𝑠 = 𝑓 𝑥 𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 𝑥𝑠
map ∷ (α → 𝛽) → [α] → [𝛽]
map f =
map f 𝑥 ∶ 𝑥𝑠 = 𝑓 𝑥 ∶ map 𝑓 𝑥𝑠
9ilter ∷ (α → 𝐵𝑜𝑜𝑙) → [α] → [α]
9ilter p =
9ilter p 𝑥 ∶ 𝑥𝑠 = 𝐢𝐟 𝑝 𝑥
𝐭𝐡𝐞𝐧 𝑥 ∶ 9ilter p 𝑥𝑠
𝐞𝐥𝐬𝐞 9ilter p 𝑥𝑠
𝑓𝑜𝑙𝑑𝑟 ∷ 𝛼 → 𝛽 → 𝛽 → 𝛽 → 𝛼 → 𝛽
𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 = 𝑒
𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 𝑥: 𝑥𝑠 = 𝑓 𝑥 𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 𝑥𝑠
map ∷ (α → 𝛽) → [α] → 𝛽
map f =
map f 𝑥 ∶ 𝑥𝑠 = 𝑓 𝑥 ∶ map 𝑓 𝑥𝑠
9ilter ∷ (α → 𝐵𝑜𝑜𝑙) → [α] → [α]
9ilter p =
9ilter p 𝑥 ∶ 𝑥𝑠 = 𝐢𝐟 𝑝 𝑥
𝐭𝐡𝐞𝐧 𝑥 ∶ 9ilter p 𝑥𝑠
𝐞𝐥𝐬𝐞 9ilter p 𝑥𝑠
8
To see why this is true, let us compare two programs for computing the sum of all the prime numbers in an interval. The first
program is written in standard iterative style:53
(define (sum-primes a b)
(define (iter count accum)
(cond ((> count b) accum)
((prime? count) (iter (+ count 1) (+ count accum)))
(else (iter (+ count 1) accum))))
(iter a 0))
The second program performs the same computation using the sequence operations of section 2.2.3:
(define (sum-primes a b)
(accumulate +
0
(filter prime? (enumerate-interval a b))))
In carrying out the computation, the first program needs to store only the sum being accumulated. In contrast, the filter in the
second program cannot do any testing until enumerate-interval has constructed a complete list of the numbers in the interval.
The filter generates another list, which in turn is passed to accumulate before being collapsed to form a sum.
Such large intermediate storage is not needed by the first program, which we can think of as enumerating the interval
incrementally, adding each prime to the sum as it is generated.
53 Assume that we have a predicate prime? (e.g., as in section 1.2.6) that tests for primality.
Structure and
Interpretation
of Computer Programs
9
The inefficiency in using lists becomes painfully apparent if we use the sequence paradigm to compute the second prime in the
interval from 10,000 to 1,000,000 by evaluating the expression
(car (cdr (filter prime?
(enumerate-interval 10000 1000000))))
This expression does find the second prime, but the computational overhead is outrageous. We construct a list of almost a million
integers, filter this list by testing each element for primality, and then ignore almost all of the result.
In a more traditional programming style, we would interleave the enumeration and the filtering, and stop when we reached the
second prime.
Streams are a clever idea that allows one to use sequence manipulations without incurring the costs of manipulating sequences as
lists.
With streams we can achieve the best of both worlds: We can formulate programs elegantly as sequence manipulations, while
attaining the efficiency of incremental computation.
The basic idea is to arrange to construct a stream only partially, and to pass the partial construction to the program that consumes
the stream.
If the consumer attempts to access a part of the stream that has not yet been constructed, the stream will automatically construct
just enough more of itself to produce the required part, thus preserving the illusion that the entire stream exists.
In other words, although we will write programs as if we were processing complete sequences, we design our stream
implementation to automatically and transparently interleave the construction of the stream with its use.
Structure and
Interpretation
of Computer Programs
10
Chapter 5 of Functional Programming in Scala explains how lazy lists
(streams) can be used to fuse together sequences of transformations.
See the next slide for how the book introduces the problem that
streams are meant to solve.
11
Chapter 5 - Strictness and laziness
In chapter 3 we talked about purely functional data structures, using singly linked lists as an example. We covered a number of
bulk operations on lists—map, filter, foldLeft, foldRight, zipWith, and so on. We noted that each of these
operations makes its own pass over the input and constructs a fresh list for the output.
Imagine if you had a deck of cards and you were asked to remove the odd-numbered cards and then flip over all the queens.
Ideally, you’d make a single pass through the deck, looking for queens and odd-numbered cards at the same time. This is
more efficient than removing the odd cards and then looking for queens in the remainder. And yet the latter is what Scala is
doing in the following code:
scala> List(1,2,3,4).map(_ + 10).filter(_ % 2 == 0).map(_ * 3)
List(36,42)
In this expression, map(_ + 10) will produce an intermediate list that then gets passed to filter(_ % 2 == 0), which
in turn constructs a list that gets passed to map(_ * 3), which then produces the final list. In other words, each
transformation will produce a temporary list that only ever gets used as input to the next transformation and is then
immediately discarded.
…
This view makes it clear how the calls to map and filter each perform their own traversal of the input and allocate lists for
the output. Wouldn’t it be nice if we could somehow fuse sequences of transformations like this into a single pass and avoid
creating temporary data structures?
We could rewrite the code into a while loop by hand, but ideally we’d like to have this done automatically while retaining the
same highlevel compositional style. We want to compose our programs using higher-order functions like map and filter instead
of writing monolithic loops.
It turns out that we can accomplish this kind of automatic loop fusion through the use of non-strictness (or, less formally,
laziness). In this chapter, we’ll explain what exactly this means, and we’ll work through the implementation of a lazy list type
that fuses sequences of transformations. Although building a “better” list is the motivation for this chapter, we’ll see that non-
strictness is a fundamental technique for improving on the efficiency and modularity of functional programs in general.
…
5.2 An extended example: lazy lists
Let’s now return to the problem posed at the beginning of this chapter. We’ll explore how laziness can be used to improve the
efficiency and modularity of functional programs using lazy lists, or streams, as an example. We’ll see how chains of
transformations on streams are fused into a single pass through the use of laziness. Here’s a simple Stream definition…
…
Functional Programming
In Scala
Paul Chiusano
Runar Bjarnason
@pchiusano
@runarorama
Michael Pilquist
@mpilquist 12
With that refresher on (or introduction to) streams out of the way, let’s
get back to the actual subject of this deck: views.
In my view (pun not intended) a great book to get started with is Li Haoyi’s
Hands-On Scala Programming. I find its approach to introducing views
unique, and I love how he includes diagrams to illustrate the concept.
@philip_schwarz
13
4.1.3 Transforms
@ Array(1, 2, 3, 4, 5).map(i => i * 2) // Multiply every element by 2
res7: Array[Int] = Array(2, 4, 6, 8, 10)
@ Array(1, 2, 3, 4, 5).filter(i => i % 2 == 1) // Keep only elements not divisible by 2
res8: Array[Int] = Array(1, 3, 5)
@ Array(1, 2, 3, 4, 5).take(2) // Keep first two elements
res9: Array[Int] = Array(1, 2)
@ Array(1, 2, 3, 4, 5).drop(2) // Discard first two elements
res10: Array[Int] = Array(3, 4, 5)
@ Array(1, 2, 3, 4, 5).slice(1, 4) // Keep elements from index 1-4
res11: Array[Int] = Array(2, 3, 4)
@ Array(1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 7, 8).distinct // Removes all duplicates
res12: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8)
@ val a = Array(1, 2, 3, 4, 5)
a: Array[Int] = Array(1, 2, 3, 4, 5)
@ val a2 = a.map(x => x + 10)
a2: Array[Int] = Array(11, 12, 13, 14, 15)
@ a(0) // Note that `a` is unchanged!
res15: Int = 1
@ a2(0)
res16: Int = 11
Li Haoyi
@lihaoyi
The copying involved in these collection transformations does have some overhead, but
in most cases that should not cause issues. If a piece of code does turn out to be a
bottleneck that is slowing down your program, you can always convert your
.map/.filter/etc. transformation code into mutating operations over raw Arrays or In-Place
Operations (4.3.4) over Mutable Collections (4.3) to optimize for performance.
Transforms take an existing collection and create a new
collection modified in some way. Note that these
transformations create copies of the collection, and leave the
original unchanged. That means if you are still using the original
array, its contents will not be modified by the transform.
14
4.1.6 Combining Operations
It is common to chain more than one operation together to achieve what you want. For example, here is a function that
computes the standard deviation of an array of numbers:
@ def stdDev(a: Array[Double]): Double = {
val mean = a.foldLeft(0.0)(_ + _) / a.length
val squareErrors = a.map(_ - mean).map(x => x * x)
math.sqrt(squareErrors.foldLeft(0.0)(_ + _) / a.length)
}
Scala collections provide a convenient helper method .sum that is equivalent to .foldLeft(0.0)(_ + _), so the above code can be
simplified to…
As another example, here is a function that…
Chaining collection transformations in this manner will always have some overhead, but for most use cases the
overhead is worth the convenience and simplicity that these transforms give you. If collection transforms do become
a bottleneck, you can optimize the code using Views (4.1.8), In-Place Operations (4.3.4), or finally by looping over the
raw Arrays yourself.
Li Haoyi
@lihaoyi
4.1.8 Views
When you chain multiple transformations on a collection, we are creating many intermediate collections that are
immediately thrown away. For example, in the following snippet:
@ val myArray = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
@ val myNewArray = myArray.map(x => x + 1).filter(x => x % 2 == 0).slice(1, 3)
myNewArray: Array[Int] = Array(4, 6)
The chain of .map .filter .slice operations ends up traversing the collection three times, creating three new collections,
but only the last collection ends up being stored in myNewArray and the others are discarded.
15
This creation and traversal of intermediate collections is wasteful. In cases where you have long chains of collection
transformations that are becoming a performance bottleneck, you can use the .view method together with .to to "fuse"
the operations together:
@ val myNewArray = myArray.view.map(_ + 1).filter(_ % 2 == 0).slice(1, 3).to(Array)
myNewArray: Array[Int] = Array(4, 6)
Using .view before the map/filter/slice transformation operations defers the actual traversal and creation of a new
collection until later, when we call .to to convert it back into a concrete collection type:
This allows us to perform this chain of map/filter/slice transformations with only a single traversal, and only creating a
single output collection. This reduces the amount of unnecessary processing and memory allocations.
Li Haoyi
@lihaoyi
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
2 4 6 8 10
4 6
1 2 3 4 5 6 7 8 9
4 6
map(x => x + 1)
filter(x => x % 2 == 0)
slice(1, 3)
myArray
myNewArray
view map filter slice to
16
The next slide is just a quick visual recap.
17
val myNewArray =
myArray
.map(_ + 1)
.filter(_ % 2 == 0)
.slice(1, 3)
val myNewArray =
myArray
.view
.map(_ + 1)
.filter(_ % 2 == 0)
.slice(1, 3)
.to(Array)
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
2 4 6 8 10
4 6
myArray
myNewArray
1 2 3 4 5 6 7 8 9
4 6
myArray
myNewArray
map(_ + 1)
filter(_ % 2 == 0)
slice(1, 3)
view
map filter slice
to
• 1 traversal
• 1 collection created (no intermediate ones)
• call to view fuses map filter and slice together
• traversal is deferred until conversion to a concrete collection using .to
• 3 traversals
• 3 collections created (2 intermediate ones)
18
Next, let’s turn to ‘The’ Scala book.
19
24.13 Views
Collections have quite a few methods that construct new collections. Some examples are map, filter, and ++.
We call such methods transformers because they take at least one collection as their receiver object and produce
another collection in their result.
Transformers can be implemented in two principal ways: strict and nonstrict (or lazy).
A strict transformer constructs a new collection with all of its elements.
A non-strict, or lazy, transformer constructs only a proxy for the result collection, and its elements are
constructed on demand.
As an example of a non-strict transformer, consider the following implementation of a lazy map operation:
def lazyMap[T, U](col: Iterable[T], f: T => U) =
new Iterable[U]:
def iterator = col.iterator.map(f)
Note that lazyMap constructs a new Iterable without stepping through all elements of the given collection coll.
The given function f is instead applied to the elements of the new collection’s iterator as they are demanded.
Scala collections are by default strict in all their transformers, except for LazyList, which implements all its
transformer methods lazily.
However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on
collection views. A view is a special kind of collection that represents some base collection, but implements all
of its transformers lazily.
To go from a collection to its view, you can use the view method on the collection. If xs is some collection, then
xs.view is the same collection, but with all transformers implemented lazily. To get back from a view to a strict
collection, you can use the to conversion operation with a strict collection factory as parameter.
Martin Odersky
@odersky
20
As an example, say you have a vector of Ints over which you want to map two functions in succession:
val v = Vector((1 to 10)*)
// Vector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
v.map(_ + 1).map(_ * 2)
// Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
In the last statement, the expression v.map(_ + 1) constructs a new vector that is then transformed into a third
vector by the second call to map(_ * 2).
In many situations, constructing the intermediate result from the first call to map is a bit wasteful. In the pseudo
example, it would be faster to do a single map with the composition of the two functions (_ + 1) and (_ * 2).
If you have the two functions available in the same place you can do this by hand. But quite often, successive
transformations of a data structure are done in different program modules. Fusing those transformations would
then undermine modularity. A more general way to avoid the intermediate results is by turning the vector first
into a view, applying all transformations to the view, and finally forcing the view to a vector:
(v.view.map(_ + 1).map(_ * 2)).toVector
// Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
We’ll do this sequence of operations again, one by one:
scala> val vv = v.view
val vv: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>)
The application v.view gives you an IndexedSeqView, a lazily evaluated IndexedSeq. As with LazyList, toString
on views does not force the view elements. That’s why the vv’s elements are displayed as not computed.
Bill Venners
@bvenners
21
Applying the first map to the view gives you:
scala> vv.map(_ + 1)
val res13: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>)
The result of the map is another IndexedSeqView[Int] value. This is in essence a wrapper that records the fact
that a map with function (_ + 1) needs to be applied on the vector v. It does not apply that map until the view is
forced, however.
We’ll now apply the second map to the last result.
scala> res13.map(_ * 2)
val res14: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>)
Finally, forcing the last result gives:
scala> res14.toVector
val res15: Seq[Int] = Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
Both stored functions, (_ + 1) and (_ * 2), get applied as part of the execution of the to operation and a new vector
is constructed. That way, no intermediate data structure is needed.
Transformation operations applied to views don’t build a new data structure. Instead, they return an iterable
whose iterator is the result of applying the transformation operation to the underlying collection’s iterator.
The main reason for using views is performance. You have seen that by switching a collection to a view the
construction of intermediate results can be avoided. These savings can be quite important.
Lex Spoon
22
As another example, consider the problem of finding the first palindrome in a list of words. A palindrome is a
word that reads backwards the same as forwards. Here are the necessary definitions:
def isPalindrome(x: String) = x == x.reverse
def findPalindrome(s: Iterable[String]) = s.find(isPalindrome)
Now, assume you have a very long sequence, words, and you want to find a palindrome in the first million words
of that sequence. Can you re-use the definition of findPalindrome? Of course, you could write:
findPalindrome(words.take(1000000))
This nicely separates the two aspects of taking the first million words of a sequence and finding a palindrome in
it. But the downside is that it always constructs an intermediary sequence consisting of one million words, even
if the first word of that sequence is already a palindrome. So potentially, 999,999 words are copied into the
intermediary result without being inspected at all afterwards. Many programmers would give up here and write
their own specialized version of finding palindromes in some given prefix of an argument sequence. But with
views, you don’t have to.
Simply write:
findPalindrome(words.view.take(1000000))
This has the same nice separation of concerns, but instead of a sequence of a million elements it will only
construct a single lightweight view object. This way, you do not need to choose between performance and
modularity.
After having seen all these nifty uses of views you might wonder why have strict collections at all? One reason
is that performance comparisons do not always favor lazy over strict collections. For smaller collection sizes the
added overhead of forming and applying closures in views is often greater than the gain from avoiding the
intermediary data structures. A possibly more important reason is that evaluation in views can be very confusing
if the delayed operations have side effects.
Frank Sommers
23
In the next excerpt, Sergei Winitzki’s coverage of
views, in the Science of Functional Programming, is
short and sweet, but also nice and practical, with an
emphasis on memory usage.
24
The code (1L to 1_000_000_000L).sum works because (1 to n) produces a sequence whose elements are computed
whenever needed but do not remain in memory. This can be seen as a sequence with the “on-call” availability of
elements. Sequences of this sort are called iterators:
scala> 1 to 5
res0: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)
scala> 1 until 5
res1: scala.collection.immutable.Range = Range(1, 2, 3, 4)
The types Range and Range.Inclusive are defined in the Scala standard library and are iterators. They behave as
collections and support the usual methods (map, filter, etc.), but they do not store previously computed values in
memory.
The view method
Eager collections such as List or Array can be converted to iterators by using the view method. This is necessary when
intermediate collections consume too much memory when fully evaluated.
For example, consider the computation of Example 2.1.5.7 where we used flatMap to replace each element of an initial
sequence by three new numbers before computing max of the resulting collection. If instead of three new numbers we
wanted to compute three million new numbers each time, the intermediate collection created by flatMap would require
too much memory, and the computation would crash:
scala> (1 to 10).flatMap(x => 1 to 3_000_000).max
java.lang.OutOfMemoryError: GC overhead limit exceeded
Even though the range (1 to 10) is an iterator, a subsequent flatMap operation creates an intermediate collection that
is too large for our computer’s memory.
We can use view to avoid this:
scala> (1 to 10).view.flatMap(x => 1 to 3_000_000).max
res0: Int = 3_000_000
…
Sergei Winitzki
sergei-winitzki-11a6431
25
NOTES (from ‘Programming in Scala’): In Scala versions before 2.8, the Range
type was lazy, so it behaved in effect like a view. Since 2.8, all collections
except lazy lists and views are strict. The only way to go from a strict to a
lazy collection is via the view method. The only way to go back is via to.
And last, but not by no means least, let’s look at
Alvin Alexander’s Scala Cookbook recipe for
Creating a Lazy View on a Collection.
26
11.4 Creating a Lazy View on a Collection
Problem
You’re working with a large collection and want to create a lazy version of it so it will only compute and return results as they are
needed.
Solution
Create a view on the collection by calling its view method. That creates a new collection whose transformer methods are
implemented in a nonstrict, or lazy, manner. For example, given a large list:
val xs = List.range(0, 3_000_000) // a list from 0 to 2,999,999
imagine that you want to call several transformation methods on it, such as map and filter. This is a contrived example, but it
demonstrates a problem:
val ys = xs.map(_ + 1).map(_ * 10).filter(_ > 1_000).filter(_ < 10_000)
If you attempt to run that example in the REPL, you’ll probably see this fatal “out of memory” error:
scala> val ys = xs.map(_ + 1) java.lang.OutOfMemoryError: GC overhead limit exceeded
Conversely, this example returns almost immediately and doesn’t throw an error because all it does is create a view and then
four lazy transformer methods:
val ys = xs.view.map(_ + 1).map(_ * 10).filter(_ > 1_000).filter(_ < 10_000)
// result: ys: scala.collection.View[Int] = View(<not computed>)
Now you can work with ys without running out of memory:
scala> ys.take(3).foreach(println)
1010
1020
1030
Calling view on a collection makes the resulting collection lazy. Now when transformer methods are called on the view, the
elements will only be calculated as they are accessed, and not “eagerly,” as they normally would be with a strict collection.
Alvin Alexander
@alvinalexander
27
Discussion
…
The use case for views
The main use case for using a view is performance, in terms of speed, memory, or both.
Regarding performance, the example in the Solution first demonstrates (a) a strict approach that runs out of memory, and then (b)
a lazy approach that lets you work with the same dataset.
The problem with the first solution is that it attempts to create new, intermediate collections each time a transformer method is
called:
val b = a.map(_ + 1) // 1st copy of the data
.map(_ * 10) // 2nd copy of the data
.filter(_ > 1_000) // 3rd copy of the data
.filter(_ < 10_000) // 4th copy of the data
If the initial collection a has one billion elements, the first map call creates a new intermediate collection with another billion
elements.
The second map call creates another collection, so now we’re attempting to hold three billion elements in memory, and so on.
To drive that point home, that approach is the same as if you had written this:
val a = List.range(0, 1_000_000_000) // 1B elements in RAM
val b = a.map(_ + 1) // 1st copy of the data (2B elements in RAM)
val c = b.map(_ * 10) // 2nd copy of the data (3B elements in RAM)
val d = c.filter(_ > 1_000) // 3rd copy of the data (~4B total)
val e = d.filter(_ < 10_000) // 4th copy of the data (~4B total)
Conversely, when you immediately create a view on the collection, everything after that essentially just creates an iterator:
val ys = a.view.map ... // this DOES NOT create another one billion elements
As usual with anything related to performance, be sure to test using a view versus not using a view in your application to find what
works best. Another performance-related reason to understand views is that it’s become very common to work with large datasets in a
streaming manner, and views work very similar to streams.
…
Alvin Alexander
@alvinalexander
28
That’s all. I hope you found it useful.
@philip_schwarz
29

More Related Content

What's hot

Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Philip Schwarz
 
Pharo 10 and beyond
 Pharo 10 and beyond Pharo 10 and beyond
Pharo 10 and beyondESUG
 
Bloc for Pharo: Current State and Future Perspective
Bloc for Pharo: Current State and Future PerspectiveBloc for Pharo: Current State and Future Perspective
Bloc for Pharo: Current State and Future PerspectiveESUG
 
Why The Free Monad isn't Free
Why The Free Monad isn't FreeWhy The Free Monad isn't Free
Why The Free Monad isn't FreeKelley Robinson
 
Peeking inside the engine of ZIO SQL.pdf
Peeking inside the engine of ZIO SQL.pdfPeeking inside the engine of ZIO SQL.pdf
Peeking inside the engine of ZIO SQL.pdfJaroslavRegec1
 
Gentle Introduction to Scala
Gentle Introduction to ScalaGentle Introduction to Scala
Gentle Introduction to ScalaFangda Wang
 
Functional Smalltalk
Functional SmalltalkFunctional Smalltalk
Functional SmalltalkESUG
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
Manuales SP3D Equipement
Manuales SP3D EquipementManuales SP3D Equipement
Manuales SP3D Equipementorv_80
 

What's hot (14)

Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Scala collection
Scala collectionScala collection
Scala collection
 
Applicative style programming
Applicative style programmingApplicative style programming
Applicative style programming
 
Sequence and Traverse - Part 1
Sequence and Traverse - Part 1Sequence and Traverse - Part 1
Sequence and Traverse - Part 1
 
Pharo 10 and beyond
 Pharo 10 and beyond Pharo 10 and beyond
Pharo 10 and beyond
 
Bloc for Pharo: Current State and Future Perspective
Bloc for Pharo: Current State and Future PerspectiveBloc for Pharo: Current State and Future Perspective
Bloc for Pharo: Current State and Future Perspective
 
Query planner
Query plannerQuery planner
Query planner
 
Why The Free Monad isn't Free
Why The Free Monad isn't FreeWhy The Free Monad isn't Free
Why The Free Monad isn't Free
 
Peeking inside the engine of ZIO SQL.pdf
Peeking inside the engine of ZIO SQL.pdfPeeking inside the engine of ZIO SQL.pdf
Peeking inside the engine of ZIO SQL.pdf
 
Gentle Introduction to Scala
Gentle Introduction to ScalaGentle Introduction to Scala
Gentle Introduction to Scala
 
Functional Smalltalk
Functional SmalltalkFunctional Smalltalk
Functional Smalltalk
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
Manuales SP3D Equipement
Manuales SP3D EquipementManuales SP3D Equipement
Manuales SP3D Equipement
 
Lazy java
Lazy javaLazy java
Lazy java
 

Similar to Fusing Transformations of Strict Scala Collections with Views

The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldPhilip Schwarz
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286Ninad Samel
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answersAkash Gawali
 
Data Structure.pdf
Data Structure.pdfData Structure.pdf
Data Structure.pdfMemeMiner
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicoreillidan2004
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With ScalaKnoldus Inc.
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with ScalaNeelkanth Sachdeva
 
CSCI 383 Lecture 3 and 4: Abstraction
CSCI 383 Lecture 3 and 4: AbstractionCSCI 383 Lecture 3 and 4: Abstraction
CSCI 383 Lecture 3 and 4: AbstractionJI Ruan
 

Similar to Fusing Transformations of Strict Scala Collections with Views (20)

The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and Fold
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Data Structure.pdf
Data Structure.pdfData Structure.pdf
Data Structure.pdf
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
 
CSCI 383 Lecture 3 and 4: Abstraction
CSCI 383 Lecture 3 and 4: AbstractionCSCI 383 Lecture 3 and 4: Abstraction
CSCI 383 Lecture 3 and 4: Abstraction
 

More from Philip Schwarz

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Folding Cheat Sheet #3 - third in a series
Folding Cheat Sheet #3 - third in a seriesFolding Cheat Sheet #3 - third in a series
Folding Cheat Sheet #3 - third in a seriesPhilip Schwarz
 
Folding Cheat Sheet #2 - second in a series
Folding Cheat Sheet #2 - second in a seriesFolding Cheat Sheet #2 - second in a series
Folding Cheat Sheet #2 - second in a seriesPhilip Schwarz
 
Folding Cheat Sheet #1 - first in a series
Folding Cheat Sheet #1 - first in a seriesFolding Cheat Sheet #1 - first in a series
Folding Cheat Sheet #1 - first in a seriesPhilip Schwarz
 
Scala Left Fold Parallelisation - Three Approaches
Scala Left Fold Parallelisation- Three ApproachesScala Left Fold Parallelisation- Three Approaches
Scala Left Fold Parallelisation - Three ApproachesPhilip Schwarz
 
A sighting of traverse_ function in Practical FP in Scala
A sighting of traverse_ function in Practical FP in ScalaA sighting of traverse_ function in Practical FP in Scala
A sighting of traverse_ function in Practical FP in ScalaPhilip Schwarz
 
A sighting of traverseFilter and foldMap in Practical FP in Scala
A sighting of traverseFilter and foldMap in Practical FP in ScalaA sighting of traverseFilter and foldMap in Practical FP in Scala
A sighting of traverseFilter and foldMap in Practical FP in ScalaPhilip Schwarz
 
A sighting of sequence function in Practical FP in Scala
A sighting of sequence function in Practical FP in ScalaA sighting of sequence function in Practical FP in Scala
A sighting of sequence function in Practical FP in ScalaPhilip Schwarz
 
N-Queens Combinatorial Puzzle meets Cats
N-Queens Combinatorial Puzzle meets CatsN-Queens Combinatorial Puzzle meets Cats
N-Queens Combinatorial Puzzle meets CatsPhilip Schwarz
 
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...Kleisli composition, flatMap, join, map, unit - implementation and interrelat...
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...Philip Schwarz
 
The aggregate function - from sequential and parallel folds to parallel aggre...
The aggregate function - from sequential and parallel folds to parallel aggre...The aggregate function - from sequential and parallel folds to parallel aggre...
The aggregate function - from sequential and parallel folds to parallel aggre...Philip Schwarz
 
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
Nat, List and Option Monoids -from scratch -Combining and Folding -an exampleNat, List and Option Monoids -from scratch -Combining and Folding -an example
Nat, List and Option Monoids - from scratch - Combining and Folding - an examplePhilip Schwarz
 
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
Nat, List and Option Monoids -from scratch -Combining and Folding -an exampleNat, List and Option Monoids -from scratch -Combining and Folding -an example
Nat, List and Option Monoids - from scratch - Combining and Folding - an examplePhilip Schwarz
 
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...Philip Schwarz
 
Sum and Product Types - The Fruit Salad & Fruit Snack Example - From F# to Ha...
Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Ha...Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Ha...
Sum and Product Types - The Fruit Salad & Fruit Snack Example - From F# to Ha...Philip Schwarz
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...Philip Schwarz
 
Jordan Peterson - The pursuit of meaning and related ethical axioms
Jordan Peterson - The pursuit of meaning and related ethical axiomsJordan Peterson - The pursuit of meaning and related ethical axioms
Jordan Peterson - The pursuit of meaning and related ethical axiomsPhilip Schwarz
 
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...Philip Schwarz
 
Defining filter using (a) recursion (b) folding with S, B and I combinators (...
Defining filter using (a) recursion (b) folding with S, B and I combinators (...Defining filter using (a) recursion (b) folding with S, B and I combinators (...
Defining filter using (a) recursion (b) folding with S, B and I combinators (...Philip Schwarz
 
The Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsThe Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsPhilip Schwarz
 

More from Philip Schwarz (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Folding Cheat Sheet #3 - third in a series
Folding Cheat Sheet #3 - third in a seriesFolding Cheat Sheet #3 - third in a series
Folding Cheat Sheet #3 - third in a series
 
Folding Cheat Sheet #2 - second in a series
Folding Cheat Sheet #2 - second in a seriesFolding Cheat Sheet #2 - second in a series
Folding Cheat Sheet #2 - second in a series
 
Folding Cheat Sheet #1 - first in a series
Folding Cheat Sheet #1 - first in a seriesFolding Cheat Sheet #1 - first in a series
Folding Cheat Sheet #1 - first in a series
 
Scala Left Fold Parallelisation - Three Approaches
Scala Left Fold Parallelisation- Three ApproachesScala Left Fold Parallelisation- Three Approaches
Scala Left Fold Parallelisation - Three Approaches
 
A sighting of traverse_ function in Practical FP in Scala
A sighting of traverse_ function in Practical FP in ScalaA sighting of traverse_ function in Practical FP in Scala
A sighting of traverse_ function in Practical FP in Scala
 
A sighting of traverseFilter and foldMap in Practical FP in Scala
A sighting of traverseFilter and foldMap in Practical FP in ScalaA sighting of traverseFilter and foldMap in Practical FP in Scala
A sighting of traverseFilter and foldMap in Practical FP in Scala
 
A sighting of sequence function in Practical FP in Scala
A sighting of sequence function in Practical FP in ScalaA sighting of sequence function in Practical FP in Scala
A sighting of sequence function in Practical FP in Scala
 
N-Queens Combinatorial Puzzle meets Cats
N-Queens Combinatorial Puzzle meets CatsN-Queens Combinatorial Puzzle meets Cats
N-Queens Combinatorial Puzzle meets Cats
 
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...Kleisli composition, flatMap, join, map, unit - implementation and interrelat...
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...
 
The aggregate function - from sequential and parallel folds to parallel aggre...
The aggregate function - from sequential and parallel folds to parallel aggre...The aggregate function - from sequential and parallel folds to parallel aggre...
The aggregate function - from sequential and parallel folds to parallel aggre...
 
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
Nat, List and Option Monoids -from scratch -Combining and Folding -an exampleNat, List and Option Monoids -from scratch -Combining and Folding -an example
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
 
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
Nat, List and Option Monoids -from scratch -Combining and Folding -an exampleNat, List and Option Monoids -from scratch -Combining and Folding -an example
Nat, List and Option Monoids - from scratch - Combining and Folding - an example
 
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...
 
Sum and Product Types - The Fruit Salad & Fruit Snack Example - From F# to Ha...
Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Ha...Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Ha...
Sum and Product Types - The Fruit Salad & Fruit Snack Example - From F# to Ha...
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
 
Jordan Peterson - The pursuit of meaning and related ethical axioms
Jordan Peterson - The pursuit of meaning and related ethical axiomsJordan Peterson - The pursuit of meaning and related ethical axioms
Jordan Peterson - The pursuit of meaning and related ethical axioms
 
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...
Defining filter using (a) recursion (b) folding (c) folding with S, B and I c...
 
Defining filter using (a) recursion (b) folding with S, B and I combinators (...
Defining filter using (a) recursion (b) folding with S, B and I combinators (...Defining filter using (a) recursion (b) folding with S, B and I combinators (...
Defining filter using (a) recursion (b) folding with S, B and I combinators (...
 
The Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor correctionsThe Sieve of Eratosthenes - Part 1 - with minor corrections
The Sieve of Eratosthenes - Part 1 - with minor corrections
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

Fusing Transformations of Strict Scala Collections with Views

  • 1. Fusing Transformations of Strict Scala Collections with Views learn about it through the work of… Harold Abelson Gerald Jay Sussman Lex Spoon Bill Venners Runar Bjarnason Paul Chiusano Michael Pilquist Li Haoyi Frank Sommers Sergei Winitzki Alvin Alexander Martin Odersky @philip_schwarz slides by http://fpilluminated.com/
  • 2. This deck was inspired by the following tweet. @philip_schwarz With excerpts from some great books, this deck aims to provide a good introduction to how the transformations of eager collections, e.g. map and filter, can be fused using views. 3
  • 3. There is a lot of overlap between the subject of views, and that of streams. For example, they both address the problem of fusing transformations, and they both do it using laziness. Because of that, I think it can be useful, before diving into the subject of views, to first go through a brief refresher on (or introduction to) streams. If you prefer to go straight to the main subject of the deck then you can just jump to slide 13. 4
  • 4. Among the many transformer methods of collections (methods that create new collections), map and filter stand out as members of the following triad that is the bread, butter and jam of functional programming. map λ (define (product-of-squares-of-odd-elements sequence) (accumulate * 1 (map square (filter odd? sequence)))) (defn product-of-squares-of-odd-elements [sequence] (accumulate * 1 (map square (filter odd? sequence)))) def product_of_squares_of_odd_elements(sequence: List[Int]): Int = sequence.filter(isOdd) .map(square) .foldRight(1)(_*_) product_of_squares_of_odd_elements :: [Int] -> Int product_of_squares_of_odd_elements sequence = foldr (*) 1 (map square (filter is_odd sequence)) product_of_squares_of_odd_elements : [Nat] -> Nat product_of_squares_of_odd_elements sequence = foldRight (*) 1 (map square (filter is_odd sequence)) 5
  • 5. I first came across the map, filter and fold triad many years ago, at university, while reading Structure and Interpretation of Computer Programs (SICP). The following four slides consist of SICP excerpts introducing the following two ideas: 1) Transformations of strict/eager lists are severely inefficient, because they require creation and copying of data structures (that may be huge) at every step of a process (sequence of transformations). 2) Streams are a clever idea that exploits delayed evaluation (non- strictness/laziness) to permit the use of transformations without incurring some of the costs of manipulating strict/eager lists. 6
  • 6. 3.5 Streams ... From an abstract point of view, a stream is simply a sequence. However, we will find that the straightforward implementation of streams as lists (as in section 2.2.1) doesn’t fully reveal the power of stream processing. As an alternative, we introduce the technique of delayed evaluation, which enables us to represent very large (even infinite) sequences as streams. Stream processing lets us model systems that have state without ever using assignment or mutable data. Structure and Interpretation of Computer Programs 2.2.1 Representing Sequences Figure 2.4: The sequence 1,2,3,4 represented as a chain of pairs. One of the useful structures we can build with pairs is a sequence -- an ordered collection of data objects. There are, of course, many ways to represent sequences in terms of pairs. One particularly straightforward representation is illustrated in figure 2.4, where the sequence 1, 2, 3, 4 is represented as a chain of pairs. The car of each pair is the corresponding item in the chain, and the cdr of the pair is the next pair in the chain. The cdr of the final pair signals the end of the sequence by pointing to a distinguished value that is not a pair, represented in box-and-pointer diagrams as a diagonal line and in programs as the value of the variable nil. The entire sequence is constructed by nested cons operations: (cons 1 (cons 2 (cons 3 (cons 4 nil)))) 7
  • 7. 3.5.1 Streams Are Delayed Lists As we saw in section 2.2.3, sequences can serve as standard interfaces for combining program modules. We formulated powerful abstractions for manipulating sequences, such as map, filter, and accumulate, that capture a wide variety of operations in a manner that is both succinct and elegant. Unfortunately, if we represent sequences as lists, this elegance is bought at the price of severe inefficiency with respect to both the time and space required by our computations. When we represent manipulations on sequences as transformations of lists, our programs must construct and copy data structures (which may be huge) at every step of a process. Structure and Interpretation of Computer Programs (define (map proc items) (if (null? items) nil (cons (proc (car items)) (map proc (cdr items))))) (define (filter predicate sequence) (cond ((null? sequence) nil) ((predicate (car sequence)) (cons (car sequence) (filter predicate (cdr sequence)))) (else (filter predicate (cdr sequence))))) (define (accumulate op initial sequence) (if (null? sequence) initial (op (car sequence) (accumulate op initial (cdr sequence))))) 𝒇𝒐𝒍𝒅𝒓 ∷ 𝛼 → 𝛽 → 𝛽 → 𝛽 → 𝛼 → 𝛽 𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 = 𝑒 𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 𝑥: 𝑥𝑠 = 𝑓 𝑥 𝒇𝒐𝒍𝒅𝒓 𝑓 𝑒 𝑥𝑠 map ∷ (α → 𝛽) → [α] → [𝛽] map f = map f 𝑥 ∶ 𝑥𝑠 = 𝑓 𝑥 ∶ map 𝑓 𝑥𝑠 9ilter ∷ (α → 𝐵𝑜𝑜𝑙) → [α] → [α] 9ilter p = 9ilter p 𝑥 ∶ 𝑥𝑠 = 𝐢𝐟 𝑝 𝑥 𝐭𝐡𝐞𝐧 𝑥 ∶ 9ilter p 𝑥𝑠 𝐞𝐥𝐬𝐞 9ilter p 𝑥𝑠 𝑓𝑜𝑙𝑑𝑟 ∷ 𝛼 → 𝛽 → 𝛽 → 𝛽 → 𝛼 → 𝛽 𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 = 𝑒 𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 𝑥: 𝑥𝑠 = 𝑓 𝑥 𝑓𝑜𝑙𝑑𝑟 𝑓 𝑒 𝑥𝑠 map ∷ (α → 𝛽) → [α] → 𝛽 map f = map f 𝑥 ∶ 𝑥𝑠 = 𝑓 𝑥 ∶ map 𝑓 𝑥𝑠 9ilter ∷ (α → 𝐵𝑜𝑜𝑙) → [α] → [α] 9ilter p = 9ilter p 𝑥 ∶ 𝑥𝑠 = 𝐢𝐟 𝑝 𝑥 𝐭𝐡𝐞𝐧 𝑥 ∶ 9ilter p 𝑥𝑠 𝐞𝐥𝐬𝐞 9ilter p 𝑥𝑠 8
  • 8. To see why this is true, let us compare two programs for computing the sum of all the prime numbers in an interval. The first program is written in standard iterative style:53 (define (sum-primes a b) (define (iter count accum) (cond ((> count b) accum) ((prime? count) (iter (+ count 1) (+ count accum))) (else (iter (+ count 1) accum)))) (iter a 0)) The second program performs the same computation using the sequence operations of section 2.2.3: (define (sum-primes a b) (accumulate + 0 (filter prime? (enumerate-interval a b)))) In carrying out the computation, the first program needs to store only the sum being accumulated. In contrast, the filter in the second program cannot do any testing until enumerate-interval has constructed a complete list of the numbers in the interval. The filter generates another list, which in turn is passed to accumulate before being collapsed to form a sum. Such large intermediate storage is not needed by the first program, which we can think of as enumerating the interval incrementally, adding each prime to the sum as it is generated. 53 Assume that we have a predicate prime? (e.g., as in section 1.2.6) that tests for primality. Structure and Interpretation of Computer Programs 9
  • 9. The inefficiency in using lists becomes painfully apparent if we use the sequence paradigm to compute the second prime in the interval from 10,000 to 1,000,000 by evaluating the expression (car (cdr (filter prime? (enumerate-interval 10000 1000000)))) This expression does find the second prime, but the computational overhead is outrageous. We construct a list of almost a million integers, filter this list by testing each element for primality, and then ignore almost all of the result. In a more traditional programming style, we would interleave the enumeration and the filtering, and stop when we reached the second prime. Streams are a clever idea that allows one to use sequence manipulations without incurring the costs of manipulating sequences as lists. With streams we can achieve the best of both worlds: We can formulate programs elegantly as sequence manipulations, while attaining the efficiency of incremental computation. The basic idea is to arrange to construct a stream only partially, and to pass the partial construction to the program that consumes the stream. If the consumer attempts to access a part of the stream that has not yet been constructed, the stream will automatically construct just enough more of itself to produce the required part, thus preserving the illusion that the entire stream exists. In other words, although we will write programs as if we were processing complete sequences, we design our stream implementation to automatically and transparently interleave the construction of the stream with its use. Structure and Interpretation of Computer Programs 10
  • 10. Chapter 5 of Functional Programming in Scala explains how lazy lists (streams) can be used to fuse together sequences of transformations. See the next slide for how the book introduces the problem that streams are meant to solve. 11
  • 11. Chapter 5 - Strictness and laziness In chapter 3 we talked about purely functional data structures, using singly linked lists as an example. We covered a number of bulk operations on lists—map, filter, foldLeft, foldRight, zipWith, and so on. We noted that each of these operations makes its own pass over the input and constructs a fresh list for the output. Imagine if you had a deck of cards and you were asked to remove the odd-numbered cards and then flip over all the queens. Ideally, you’d make a single pass through the deck, looking for queens and odd-numbered cards at the same time. This is more efficient than removing the odd cards and then looking for queens in the remainder. And yet the latter is what Scala is doing in the following code: scala> List(1,2,3,4).map(_ + 10).filter(_ % 2 == 0).map(_ * 3) List(36,42) In this expression, map(_ + 10) will produce an intermediate list that then gets passed to filter(_ % 2 == 0), which in turn constructs a list that gets passed to map(_ * 3), which then produces the final list. In other words, each transformation will produce a temporary list that only ever gets used as input to the next transformation and is then immediately discarded. … This view makes it clear how the calls to map and filter each perform their own traversal of the input and allocate lists for the output. Wouldn’t it be nice if we could somehow fuse sequences of transformations like this into a single pass and avoid creating temporary data structures? We could rewrite the code into a while loop by hand, but ideally we’d like to have this done automatically while retaining the same highlevel compositional style. We want to compose our programs using higher-order functions like map and filter instead of writing monolithic loops. It turns out that we can accomplish this kind of automatic loop fusion through the use of non-strictness (or, less formally, laziness). In this chapter, we’ll explain what exactly this means, and we’ll work through the implementation of a lazy list type that fuses sequences of transformations. Although building a “better” list is the motivation for this chapter, we’ll see that non- strictness is a fundamental technique for improving on the efficiency and modularity of functional programs in general. … 5.2 An extended example: lazy lists Let’s now return to the problem posed at the beginning of this chapter. We’ll explore how laziness can be used to improve the efficiency and modularity of functional programs using lazy lists, or streams, as an example. We’ll see how chains of transformations on streams are fused into a single pass through the use of laziness. Here’s a simple Stream definition… … Functional Programming In Scala Paul Chiusano Runar Bjarnason @pchiusano @runarorama Michael Pilquist @mpilquist 12
  • 12. With that refresher on (or introduction to) streams out of the way, let’s get back to the actual subject of this deck: views. In my view (pun not intended) a great book to get started with is Li Haoyi’s Hands-On Scala Programming. I find its approach to introducing views unique, and I love how he includes diagrams to illustrate the concept. @philip_schwarz 13
  • 13. 4.1.3 Transforms @ Array(1, 2, 3, 4, 5).map(i => i * 2) // Multiply every element by 2 res7: Array[Int] = Array(2, 4, 6, 8, 10) @ Array(1, 2, 3, 4, 5).filter(i => i % 2 == 1) // Keep only elements not divisible by 2 res8: Array[Int] = Array(1, 3, 5) @ Array(1, 2, 3, 4, 5).take(2) // Keep first two elements res9: Array[Int] = Array(1, 2) @ Array(1, 2, 3, 4, 5).drop(2) // Discard first two elements res10: Array[Int] = Array(3, 4, 5) @ Array(1, 2, 3, 4, 5).slice(1, 4) // Keep elements from index 1-4 res11: Array[Int] = Array(2, 3, 4) @ Array(1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 3, 4, 5, 6, 7, 8).distinct // Removes all duplicates res12: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8) @ val a = Array(1, 2, 3, 4, 5) a: Array[Int] = Array(1, 2, 3, 4, 5) @ val a2 = a.map(x => x + 10) a2: Array[Int] = Array(11, 12, 13, 14, 15) @ a(0) // Note that `a` is unchanged! res15: Int = 1 @ a2(0) res16: Int = 11 Li Haoyi @lihaoyi The copying involved in these collection transformations does have some overhead, but in most cases that should not cause issues. If a piece of code does turn out to be a bottleneck that is slowing down your program, you can always convert your .map/.filter/etc. transformation code into mutating operations over raw Arrays or In-Place Operations (4.3.4) over Mutable Collections (4.3) to optimize for performance. Transforms take an existing collection and create a new collection modified in some way. Note that these transformations create copies of the collection, and leave the original unchanged. That means if you are still using the original array, its contents will not be modified by the transform. 14
  • 14. 4.1.6 Combining Operations It is common to chain more than one operation together to achieve what you want. For example, here is a function that computes the standard deviation of an array of numbers: @ def stdDev(a: Array[Double]): Double = { val mean = a.foldLeft(0.0)(_ + _) / a.length val squareErrors = a.map(_ - mean).map(x => x * x) math.sqrt(squareErrors.foldLeft(0.0)(_ + _) / a.length) } Scala collections provide a convenient helper method .sum that is equivalent to .foldLeft(0.0)(_ + _), so the above code can be simplified to… As another example, here is a function that… Chaining collection transformations in this manner will always have some overhead, but for most use cases the overhead is worth the convenience and simplicity that these transforms give you. If collection transforms do become a bottleneck, you can optimize the code using Views (4.1.8), In-Place Operations (4.3.4), or finally by looping over the raw Arrays yourself. Li Haoyi @lihaoyi 4.1.8 Views When you chain multiple transformations on a collection, we are creating many intermediate collections that are immediately thrown away. For example, in the following snippet: @ val myArray = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) @ val myNewArray = myArray.map(x => x + 1).filter(x => x % 2 == 0).slice(1, 3) myNewArray: Array[Int] = Array(4, 6) The chain of .map .filter .slice operations ends up traversing the collection three times, creating three new collections, but only the last collection ends up being stored in myNewArray and the others are discarded. 15
  • 15. This creation and traversal of intermediate collections is wasteful. In cases where you have long chains of collection transformations that are becoming a performance bottleneck, you can use the .view method together with .to to "fuse" the operations together: @ val myNewArray = myArray.view.map(_ + 1).filter(_ % 2 == 0).slice(1, 3).to(Array) myNewArray: Array[Int] = Array(4, 6) Using .view before the map/filter/slice transformation operations defers the actual traversal and creation of a new collection until later, when we call .to to convert it back into a concrete collection type: This allows us to perform this chain of map/filter/slice transformations with only a single traversal, and only creating a single output collection. This reduces the amount of unnecessary processing and memory allocations. Li Haoyi @lihaoyi 1 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 10 2 4 6 8 10 4 6 1 2 3 4 5 6 7 8 9 4 6 map(x => x + 1) filter(x => x % 2 == 0) slice(1, 3) myArray myNewArray view map filter slice to 16
  • 16. The next slide is just a quick visual recap. 17
  • 17. val myNewArray = myArray .map(_ + 1) .filter(_ % 2 == 0) .slice(1, 3) val myNewArray = myArray .view .map(_ + 1) .filter(_ % 2 == 0) .slice(1, 3) .to(Array) 1 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 10 2 4 6 8 10 4 6 myArray myNewArray 1 2 3 4 5 6 7 8 9 4 6 myArray myNewArray map(_ + 1) filter(_ % 2 == 0) slice(1, 3) view map filter slice to • 1 traversal • 1 collection created (no intermediate ones) • call to view fuses map filter and slice together • traversal is deferred until conversion to a concrete collection using .to • 3 traversals • 3 collections created (2 intermediate ones) 18
  • 18. Next, let’s turn to ‘The’ Scala book. 19
  • 19. 24.13 Views Collections have quite a few methods that construct new collections. Some examples are map, filter, and ++. We call such methods transformers because they take at least one collection as their receiver object and produce another collection in their result. Transformers can be implemented in two principal ways: strict and nonstrict (or lazy). A strict transformer constructs a new collection with all of its elements. A non-strict, or lazy, transformer constructs only a proxy for the result collection, and its elements are constructed on demand. As an example of a non-strict transformer, consider the following implementation of a lazy map operation: def lazyMap[T, U](col: Iterable[T], f: T => U) = new Iterable[U]: def iterator = col.iterator.map(f) Note that lazyMap constructs a new Iterable without stepping through all elements of the given collection coll. The given function f is instead applied to the elements of the new collection’s iterator as they are demanded. Scala collections are by default strict in all their transformers, except for LazyList, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on collection views. A view is a special kind of collection that represents some base collection, but implements all of its transformers lazily. To go from a collection to its view, you can use the view method on the collection. If xs is some collection, then xs.view is the same collection, but with all transformers implemented lazily. To get back from a view to a strict collection, you can use the to conversion operation with a strict collection factory as parameter. Martin Odersky @odersky 20
  • 20. As an example, say you have a vector of Ints over which you want to map two functions in succession: val v = Vector((1 to 10)*) // Vector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) v.map(_ + 1).map(_ * 2) // Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) In the last statement, the expression v.map(_ + 1) constructs a new vector that is then transformed into a third vector by the second call to map(_ * 2). In many situations, constructing the intermediate result from the first call to map is a bit wasteful. In the pseudo example, it would be faster to do a single map with the composition of the two functions (_ + 1) and (_ * 2). If you have the two functions available in the same place you can do this by hand. But quite often, successive transformations of a data structure are done in different program modules. Fusing those transformations would then undermine modularity. A more general way to avoid the intermediate results is by turning the vector first into a view, applying all transformations to the view, and finally forcing the view to a vector: (v.view.map(_ + 1).map(_ * 2)).toVector // Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) We’ll do this sequence of operations again, one by one: scala> val vv = v.view val vv: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>) The application v.view gives you an IndexedSeqView, a lazily evaluated IndexedSeq. As with LazyList, toString on views does not force the view elements. That’s why the vv’s elements are displayed as not computed. Bill Venners @bvenners 21
  • 21. Applying the first map to the view gives you: scala> vv.map(_ + 1) val res13: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>) The result of the map is another IndexedSeqView[Int] value. This is in essence a wrapper that records the fact that a map with function (_ + 1) needs to be applied on the vector v. It does not apply that map until the view is forced, however. We’ll now apply the second map to the last result. scala> res13.map(_ * 2) val res14: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>) Finally, forcing the last result gives: scala> res14.toVector val res15: Seq[Int] = Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) Both stored functions, (_ + 1) and (_ * 2), get applied as part of the execution of the to operation and a new vector is constructed. That way, no intermediate data structure is needed. Transformation operations applied to views don’t build a new data structure. Instead, they return an iterable whose iterator is the result of applying the transformation operation to the underlying collection’s iterator. The main reason for using views is performance. You have seen that by switching a collection to a view the construction of intermediate results can be avoided. These savings can be quite important. Lex Spoon 22
  • 22. As another example, consider the problem of finding the first palindrome in a list of words. A palindrome is a word that reads backwards the same as forwards. Here are the necessary definitions: def isPalindrome(x: String) = x == x.reverse def findPalindrome(s: Iterable[String]) = s.find(isPalindrome) Now, assume you have a very long sequence, words, and you want to find a palindrome in the first million words of that sequence. Can you re-use the definition of findPalindrome? Of course, you could write: findPalindrome(words.take(1000000)) This nicely separates the two aspects of taking the first million words of a sequence and finding a palindrome in it. But the downside is that it always constructs an intermediary sequence consisting of one million words, even if the first word of that sequence is already a palindrome. So potentially, 999,999 words are copied into the intermediary result without being inspected at all afterwards. Many programmers would give up here and write their own specialized version of finding palindromes in some given prefix of an argument sequence. But with views, you don’t have to. Simply write: findPalindrome(words.view.take(1000000)) This has the same nice separation of concerns, but instead of a sequence of a million elements it will only construct a single lightweight view object. This way, you do not need to choose between performance and modularity. After having seen all these nifty uses of views you might wonder why have strict collections at all? One reason is that performance comparisons do not always favor lazy over strict collections. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A possibly more important reason is that evaluation in views can be very confusing if the delayed operations have side effects. Frank Sommers 23
  • 23. In the next excerpt, Sergei Winitzki’s coverage of views, in the Science of Functional Programming, is short and sweet, but also nice and practical, with an emphasis on memory usage. 24
  • 24. The code (1L to 1_000_000_000L).sum works because (1 to n) produces a sequence whose elements are computed whenever needed but do not remain in memory. This can be seen as a sequence with the “on-call” availability of elements. Sequences of this sort are called iterators: scala> 1 to 5 res0: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5) scala> 1 until 5 res1: scala.collection.immutable.Range = Range(1, 2, 3, 4) The types Range and Range.Inclusive are defined in the Scala standard library and are iterators. They behave as collections and support the usual methods (map, filter, etc.), but they do not store previously computed values in memory. The view method Eager collections such as List or Array can be converted to iterators by using the view method. This is necessary when intermediate collections consume too much memory when fully evaluated. For example, consider the computation of Example 2.1.5.7 where we used flatMap to replace each element of an initial sequence by three new numbers before computing max of the resulting collection. If instead of three new numbers we wanted to compute three million new numbers each time, the intermediate collection created by flatMap would require too much memory, and the computation would crash: scala> (1 to 10).flatMap(x => 1 to 3_000_000).max java.lang.OutOfMemoryError: GC overhead limit exceeded Even though the range (1 to 10) is an iterator, a subsequent flatMap operation creates an intermediate collection that is too large for our computer’s memory. We can use view to avoid this: scala> (1 to 10).view.flatMap(x => 1 to 3_000_000).max res0: Int = 3_000_000 … Sergei Winitzki sergei-winitzki-11a6431 25 NOTES (from ‘Programming in Scala’): In Scala versions before 2.8, the Range type was lazy, so it behaved in effect like a view. Since 2.8, all collections except lazy lists and views are strict. The only way to go from a strict to a lazy collection is via the view method. The only way to go back is via to.
  • 25. And last, but not by no means least, let’s look at Alvin Alexander’s Scala Cookbook recipe for Creating a Lazy View on a Collection. 26
  • 26. 11.4 Creating a Lazy View on a Collection Problem You’re working with a large collection and want to create a lazy version of it so it will only compute and return results as they are needed. Solution Create a view on the collection by calling its view method. That creates a new collection whose transformer methods are implemented in a nonstrict, or lazy, manner. For example, given a large list: val xs = List.range(0, 3_000_000) // a list from 0 to 2,999,999 imagine that you want to call several transformation methods on it, such as map and filter. This is a contrived example, but it demonstrates a problem: val ys = xs.map(_ + 1).map(_ * 10).filter(_ > 1_000).filter(_ < 10_000) If you attempt to run that example in the REPL, you’ll probably see this fatal “out of memory” error: scala> val ys = xs.map(_ + 1) java.lang.OutOfMemoryError: GC overhead limit exceeded Conversely, this example returns almost immediately and doesn’t throw an error because all it does is create a view and then four lazy transformer methods: val ys = xs.view.map(_ + 1).map(_ * 10).filter(_ > 1_000).filter(_ < 10_000) // result: ys: scala.collection.View[Int] = View(<not computed>) Now you can work with ys without running out of memory: scala> ys.take(3).foreach(println) 1010 1020 1030 Calling view on a collection makes the resulting collection lazy. Now when transformer methods are called on the view, the elements will only be calculated as they are accessed, and not “eagerly,” as they normally would be with a strict collection. Alvin Alexander @alvinalexander 27
  • 27. Discussion … The use case for views The main use case for using a view is performance, in terms of speed, memory, or both. Regarding performance, the example in the Solution first demonstrates (a) a strict approach that runs out of memory, and then (b) a lazy approach that lets you work with the same dataset. The problem with the first solution is that it attempts to create new, intermediate collections each time a transformer method is called: val b = a.map(_ + 1) // 1st copy of the data .map(_ * 10) // 2nd copy of the data .filter(_ > 1_000) // 3rd copy of the data .filter(_ < 10_000) // 4th copy of the data If the initial collection a has one billion elements, the first map call creates a new intermediate collection with another billion elements. The second map call creates another collection, so now we’re attempting to hold three billion elements in memory, and so on. To drive that point home, that approach is the same as if you had written this: val a = List.range(0, 1_000_000_000) // 1B elements in RAM val b = a.map(_ + 1) // 1st copy of the data (2B elements in RAM) val c = b.map(_ * 10) // 2nd copy of the data (3B elements in RAM) val d = c.filter(_ > 1_000) // 3rd copy of the data (~4B total) val e = d.filter(_ < 10_000) // 4th copy of the data (~4B total) Conversely, when you immediately create a view on the collection, everything after that essentially just creates an iterator: val ys = a.view.map ... // this DOES NOT create another one billion elements As usual with anything related to performance, be sure to test using a view versus not using a view in your application to find what works best. Another performance-related reason to understand views is that it’s become very common to work with large datasets in a streaming manner, and views work very similar to streams. … Alvin Alexander @alvinalexander 28
  • 28. That’s all. I hope you found it useful. @philip_schwarz 29