Collections in Clojure
Jan Herich
2014-03-19 Mon
Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23
Outline
1 Basic Clojure collection types
2 Persistent characteristics of Clojure collections
3 Sequence abstraction and la...
Basic Clojure collection types
Lists
List data-structure
implemented as
ordinary
single-linked list
Lists are special
beca...
Basic Clojure collection types
Sets
Sets are collections
of unique elements
As every collection
in Clojure, sets can
be he...
Basic Clojure collection types
Maps
Maps is a basic
construct for
holding structured
information
Default
implementation
us...
Basic Clojure collection types
Vectors
Vector is the right
structure for
ordered data where
random look-up is
necessary
Fa...
Persistent characteristics of Clojure collections
Non-destructive updates
All Clojure persistent collections support funct...
Persistent characteristics of Clojure collections
Example of structural sharing
Before update After update
Jan Herich Coll...
Sequence abstraction and laziness
Sequence as a powerful abstraction for collections
Sequence is a logical list, persisten...
Sequence abstraction and laziness
Sequences explained
You can call seq on any Clojure collection, which yields sequence
im...
Sequence abstraction and laziness
How Clojure leverages sequences
As already mentioned, many Clojure functions are defined ...
Sequence abstraction and laziness
Composing collection transformations
;; filter countries, calculate densities and sort t...
Sequence abstraction and laziness
Laziness
As it turns out, it’s very easy to express infinite sequences, just
by defining s...
Sequence abstraction and laziness
How to express laziness in Clojure
;; define fibonacci number as lazy sequence with
;; t...
Reducers - better performance and parallelism
Reducers, or another useful collection abstraction
Why another abstraction i...
Reducers - better performance and parallelism
How is reducible defined
It’s important to understand the reduce function:
;;...
Reducers - better performance and parallelism
Digging deeper into reducers
Reducers are about transformation of reducing f...
Reducers - better performance and parallelism
Applying reducers
If we keep the definition of mapping from previous slide, o...
Reducers - better performance and parallelism
What we gain and what we loose
Reducers are faster and more memory efficient t...
Reducers - better performance and parallelism
Enter parallelism
With reducers, core collection operations are freed from l...
Reducers - better performance and parallelism
Fold example
(require ’[clojure.core.reducers :as r])
;; we use the same com...
Reducers - better performance and parallelism
Conclusion
Fold will take advantage of collections which are amenable to
par...
Reducers - better performance and parallelism
The End
Thank you for your attention
I hope this presentation sparked your
i...
Upcoming SlideShare
Loading in …5
×

Scala user-group-19.03.2014

658 views

Published on

Intro to clojure collections & collection abstractions

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
658
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scala user-group-19.03.2014

  1. 1. Collections in Clojure Jan Herich 2014-03-19 Mon Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23
  2. 2. Outline 1 Basic Clojure collection types 2 Persistent characteristics of Clojure collections 3 Sequence abstraction and laziness 4 Reducers - better performance and parallelism Jan Herich Collections in Clojure 2014-03-19 Mon 2 / 23
  3. 3. Basic Clojure collection types Lists List data-structure implemented as ordinary single-linked list Lists are special because they are used to compose Clojure programs Unquoted lists are treated as function calls by Clojure environment ;; list literal representation ’(1 2 :id (3 4) "name") ;; unquoted list interpreted ;; as function call (= (+ 1 2) 3) ;; get the first element (= (peek ’(1 2 3)) 1) ;; new vector from old one (= (pop ’(1 2 3)) ’(2 3)) (= (conj ’(3 2 1) 4) ’(4 3 2 1)) Jan Herich Collections in Clojure 2014-03-19 Mon 3 / 23
  4. 4. Basic Clojure collection types Sets Sets are collections of unique elements As every collection in Clojure, sets can be heterogeneous Fast membership test ;; set literal representation #{1 :id :type "name"} ;; testing membership (= true (contains? #{1 2} 2)) ;; new set from old one (= (disj #{1 2 3} 2) #{1 3}) (= (conj #{1 3} 2) #{1 2 3}) Jan Herich Collections in Clojure 2014-03-19 Mon 4 / 23
  5. 5. Basic Clojure collection types Maps Maps is a basic construct for holding structured information Default implementation uses a well-known hash-map mechanism Fast look-up ;; map literal representation {:id 1 :name "John"} ;; Optional comma delimiters {:id 1, :name "John"} ;; lookup (= (get {:id 1 :name "John"} :id) 1) ;; new map from old one (= (dissoc {:id 1 :name "John"} :name) {:id 1}) (= (assoc {:id 1} :name "John") {:id 1 :name "John"}) Jan Herich Collections in Clojure 2014-03-19 Mon 5 / 23
  6. 6. Basic Clojure collection types Vectors Vector is the right structure for ordered data where random look-up is necessary Fast look-up by index Maintains ordering of elements ;; vector literal representation [1 2 3 4 5] ;; lookup by zero based index (= (get [1 2 3] 2) 3) ;; new vector from old one (= (subvec [1 2 3 4 5] 2) [3 4 5]) (= (conj [1 2 3] 4) [1 2 3 4]) (= (assoc [1 3] 0 2) [2 3]) Jan Herich Collections in Clojure 2014-03-19 Mon 6 / 23
  7. 7. Persistent characteristics of Clojure collections Non-destructive updates All Clojure persistent collections support functional, non-destructive updates, instead of in-place mutation of data To guarantee that updates with such semantics will be fast and memory efficient, it’s obvious that simple defensive copying won’t work Luckily, there is a technique called structural sharing, which can help us Jan Herich Collections in Clojure 2014-03-19 Mon 7 / 23
  8. 8. Persistent characteristics of Clojure collections Example of structural sharing Before update After update Jan Herich Collections in Clojure 2014-03-19 Mon 8 / 23
  9. 9. Sequence abstraction and laziness Sequence as a powerful abstraction for collections Sequence is a logical list, persistent and immutable view of the collection All core Clojure collections provide sequence implementations Most core Clojure transformation functions for manipulating collections like filter or map are defined in terms of sequences This is very handy when composing collection transformations Jan Herich Collections in Clojure 2014-03-19 Mon 9 / 23
  10. 10. Sequence abstraction and laziness Sequences explained You can call seq on any Clojure collection, which yields sequence implementation appropriate to the collection. This implementation provides following basic guarantees (which are defined in terms of the ISeq interface under the hood): ;; Returns the first item in the collection. Calls seq ;; on its argument. If coll is nil, returns nil (first coll) ;; Returns a sequence of the items after the first. ;; Calls seq on its argument. If there are no more items, ;; returns a logical sequence for which seq returns nil (rest coll) ;; Returns a new seq where item is the first element ;; and seq is the rest (cons item seq) Jan Herich Collections in Clojure 2014-03-19 Mon 10 / 23
  11. 11. Sequence abstraction and laziness How Clojure leverages sequences As already mentioned, many Clojure functions are defined in terms of sequences, for example, have a look at greatly simplified map implementation: (defn map [f coll] (when-let [s (seq coll)] (cons (f (first s)) (map f (rest s))))) This enable the map function to operate on any collection which satisfies sequence interface, because the map function calls seq on its second (coll) argument. Notice that the map returns sequence as well, with the consequence, that functions operating on sequences can be easily composed together. Jan Herich Collections in Clojure 2014-03-19 Mon 11 / 23
  12. 12. Sequence abstraction and laziness Composing collection transformations ;; filter countries, calculate densities and sort them (->> ’({:code "SK" :area 49035 :population 5415949} {:code "CZ" :area 78866 :population 10513209} {:code "AT" :area 83855 :population 8414638} {:code "HU" :area 93030 :population 9908798}) (filter (fn [country] (> (get country :area) 80000))) (map (fn [country] (assoc country :density (double (/ (get country :population) (get country :area)))))) (sort-by (fn [country] (get country :density)))) Jan Herich Collections in Clojure 2014-03-19 Mon 12 / 23
  13. 13. Sequence abstraction and laziness Laziness As it turns out, it’s very easy to express infinite sequences, just by defining some recursive relations between sequence elements Clojure gives us many functions for infinite sequences, such as iterate ;; infinite stream of ascending numbers from zero (iterate inc 0) ;; to avoid blocking the consuming thread, use take (take 10 (iterate inc 0)) To be able to express such infinite sequences, we need to express laziness In fact, most Clojure core functions (for example map) are defined as lazy so they can consume and produce lazy sequences Jan Herich Collections in Clojure 2014-03-19 Mon 13 / 23
  14. 14. Sequence abstraction and laziness How to express laziness in Clojure ;; define fibonacci number as lazy sequence with ;; the help of lazy-seq macro (defn fib [a b] (cons a (lazy-seq (fib b (+ a b))))) ;; consume first ten numbers from sequence (take 10 (fib 0 1)) ;; map is lazy as well (take 10 (map (fn [x] (* 3 x)) (fib 0 1))) Jan Herich Collections in Clojure 2014-03-19 Mon 14 / 23
  15. 15. Reducers - better performance and parallelism Reducers, or another useful collection abstraction Why another abstraction if we already have sequences ? 1 Laziness is great when we need it, but not always 2 Sequence is fundamentally serial 3 Those two points are problems if we want high-performing solution which can easily exploit parallelism Therefore, we need to find some new notion of collection, even simpler one than sequence abstraction The new, minimalist notion of collection is something which is reducible Jan Herich Collections in Clojure 2014-03-19 Mon 15 / 23
  16. 16. Reducers - better performance and parallelism How is reducible defined It’s important to understand the reduce function: ;; this is a simplified definition of reduce (defn reduce [f init coll] (if-let [s (seq coll)] (reduce f (f init (first s)) (rest s)) init)) ;; this is how we call reduce with reducing function (reduce (fn [accumulator item] (* accumulator item)) 1 ’(1 2 3 4 5 6 7)) Reducible is something which can reduce itself, and we are not interested in actual mechanism Jan Herich Collections in Clojure 2014-03-19 Mon 16 / 23
  17. 17. Reducers - better performance and parallelism Digging deeper into reducers Reducers are about transformation of reducing functions ;; new simplified definition of map (defn mapping [f] (fn [f1] (fn [accumulator item] (f1 accumulator (f item))))) Reducers library offer alternatives to sequence functions defined similar to mapping above => as a higher order functions which transform the reducing step to include the logic of mapping, filtering, etc What’s particularly nice, is that those functions consist only of the core logic of their operations Jan Herich Collections in Clojure 2014-03-19 Mon 17 / 23
  18. 18. Reducers - better performance and parallelism Applying reducers If we keep the definition of mapping from previous slide, our code would be little strange ;; our sequence based code (reduce + 0 (map (fn [x] (* x 3)) ’(1 2 3))) ;; and equivalent reducers based code (reduce ((mapping (fn [x] (* x 3))) +) 0 ’(1 2 3)) Luckily, we are in a LISP land, so reducers library handles such details with the help of macros and we are working with functions which have the same shape as before ;; require reducers library (require ’[clojure.core.reducers :as r]) ;; use it (reduce + 0 (r/map (fn [x] (* x 3)) 0 ’(1 2 3))) Jan Herich Collections in Clojure 2014-03-19 Mon 18 / 23
  19. 19. Reducers - better performance and parallelism What we gain and what we loose Reducers are faster and more memory efficient then their sequence based counterparts, specially when more transformations are chained (have a look at slide 12), because no intermediate sequences are produced This is because composing reducers functions merely creates a recipe for future reduction, no work is done until reduce is called We loose laziness in the process, so we can’t write this expression with reducers anymore (take 10 (r/map (fn [x] (* 3 x)) (fib 0 1))) (compiler will complain, because unlike normal map, r/map doesn’t return a sequence) Jan Herich Collections in Clojure 2014-03-19 Mon 19 / 23
  20. 20. Reducers - better performance and parallelism Enter parallelism With reducers, core collection operations are freed from laziness and representation, but we are stuck with reduce function which is serial as well But we can parallelize reduction by using independent sub-reductions and combining their results There is a function which does just that: fold fold takes an combining function, reducing function and collection and returns the result of combining the results of reducing sub-segments of the collection, potentially in parallel Jan Herich Collections in Clojure 2014-03-19 Mon 20 / 23
  21. 21. Reducers - better performance and parallelism Fold example (require ’[clojure.core.reducers :as r]) ;; we use the same combine and reduce function (r/fold + + [1 2 3 4 5 6]) ;; when this is the case, it’s enough to supply ;; just reducing function and fold will use it ;; to combine the the sub-reductions (r/fold + [1 2 3 4 5 6]) Jan Herich Collections in Clojure 2014-03-19 Mon 21 / 23
  22. 22. Reducers - better performance and parallelism Conclusion Fold will take advantage of collections which are amenable to parallel subdivision, ideal candidates are trees, such as Clojure vectors and maps Parallel implementations of fold for those collections are based upon Java ForkJoin framework If the underlying collection is not suited for parallel subdivision (as is the case with sequence), fold just devolves into reduce Jan Herich Collections in Clojure 2014-03-19 Mon 22 / 23
  23. 23. Reducers - better performance and parallelism The End Thank you for your attention I hope this presentation sparked your interest in Clojure, in which case, visit www.clojure.org and learn more ! Jan Herich Collections in Clojure 2014-03-19 Mon 23 / 23

×