• Like
Scala user-group-19.03.2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Scala user-group-19.03.2014

  • 289 views
Published

Intro to clojure collections & collection abstractions

Intro to clojure collections & collection abstractions

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
289
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Collections in Clojure Jan Herich 2014-03-19 Mon Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23
  • 2. Outline 1 Basic Clojure collection types 2 Persistent characteristics of Clojure collections 3 Sequence abstraction and laziness 4 Reducers - better performance and parallelism Jan Herich Collections in Clojure 2014-03-19 Mon 2 / 23
  • 3. Basic Clojure collection types Lists List data-structure implemented as ordinary single-linked list Lists are special because they are used to compose Clojure programs Unquoted lists are treated as function calls by Clojure environment ;; list literal representation ’(1 2 :id (3 4) "name") ;; unquoted list interpreted ;; as function call (= (+ 1 2) 3) ;; get the first element (= (peek ’(1 2 3)) 1) ;; new vector from old one (= (pop ’(1 2 3)) ’(2 3)) (= (conj ’(3 2 1) 4) ’(4 3 2 1)) Jan Herich Collections in Clojure 2014-03-19 Mon 3 / 23
  • 4. Basic Clojure collection types Sets Sets are collections of unique elements As every collection in Clojure, sets can be heterogeneous Fast membership test ;; set literal representation #{1 :id :type "name"} ;; testing membership (= true (contains? #{1 2} 2)) ;; new set from old one (= (disj #{1 2 3} 2) #{1 3}) (= (conj #{1 3} 2) #{1 2 3}) Jan Herich Collections in Clojure 2014-03-19 Mon 4 / 23
  • 5. Basic Clojure collection types Maps Maps is a basic construct for holding structured information Default implementation uses a well-known hash-map mechanism Fast look-up ;; map literal representation {:id 1 :name "John"} ;; Optional comma delimiters {:id 1, :name "John"} ;; lookup (= (get {:id 1 :name "John"} :id) 1) ;; new map from old one (= (dissoc {:id 1 :name "John"} :name) {:id 1}) (= (assoc {:id 1} :name "John") {:id 1 :name "John"}) Jan Herich Collections in Clojure 2014-03-19 Mon 5 / 23
  • 6. Basic Clojure collection types Vectors Vector is the right structure for ordered data where random look-up is necessary Fast look-up by index Maintains ordering of elements ;; vector literal representation [1 2 3 4 5] ;; lookup by zero based index (= (get [1 2 3] 2) 3) ;; new vector from old one (= (subvec [1 2 3 4 5] 2) [3 4 5]) (= (conj [1 2 3] 4) [1 2 3 4]) (= (assoc [1 3] 0 2) [2 3]) Jan Herich Collections in Clojure 2014-03-19 Mon 6 / 23
  • 7. Persistent characteristics of Clojure collections Non-destructive updates All Clojure persistent collections support functional, non-destructive updates, instead of in-place mutation of data To guarantee that updates with such semantics will be fast and memory efficient, it’s obvious that simple defensive copying won’t work Luckily, there is a technique called structural sharing, which can help us Jan Herich Collections in Clojure 2014-03-19 Mon 7 / 23
  • 8. Persistent characteristics of Clojure collections Example of structural sharing Before update After update Jan Herich Collections in Clojure 2014-03-19 Mon 8 / 23
  • 9. Sequence abstraction and laziness Sequence as a powerful abstraction for collections Sequence is a logical list, persistent and immutable view of the collection All core Clojure collections provide sequence implementations Most core Clojure transformation functions for manipulating collections like filter or map are defined in terms of sequences This is very handy when composing collection transformations Jan Herich Collections in Clojure 2014-03-19 Mon 9 / 23
  • 10. Sequence abstraction and laziness Sequences explained You can call seq on any Clojure collection, which yields sequence implementation appropriate to the collection. This implementation provides following basic guarantees (which are defined in terms of the ISeq interface under the hood): ;; Returns the first item in the collection. Calls seq ;; on its argument. If coll is nil, returns nil (first coll) ;; Returns a sequence of the items after the first. ;; Calls seq on its argument. If there are no more items, ;; returns a logical sequence for which seq returns nil (rest coll) ;; Returns a new seq where item is the first element ;; and seq is the rest (cons item seq) Jan Herich Collections in Clojure 2014-03-19 Mon 10 / 23
  • 11. Sequence abstraction and laziness How Clojure leverages sequences As already mentioned, many Clojure functions are defined in terms of sequences, for example, have a look at greatly simplified map implementation: (defn map [f coll] (when-let [s (seq coll)] (cons (f (first s)) (map f (rest s))))) This enable the map function to operate on any collection which satisfies sequence interface, because the map function calls seq on its second (coll) argument. Notice that the map returns sequence as well, with the consequence, that functions operating on sequences can be easily composed together. Jan Herich Collections in Clojure 2014-03-19 Mon 11 / 23
  • 12. Sequence abstraction and laziness Composing collection transformations ;; filter countries, calculate densities and sort them (->> ’({:code "SK" :area 49035 :population 5415949} {:code "CZ" :area 78866 :population 10513209} {:code "AT" :area 83855 :population 8414638} {:code "HU" :area 93030 :population 9908798}) (filter (fn [country] (> (get country :area) 80000))) (map (fn [country] (assoc country :density (double (/ (get country :population) (get country :area)))))) (sort-by (fn [country] (get country :density)))) Jan Herich Collections in Clojure 2014-03-19 Mon 12 / 23
  • 13. Sequence abstraction and laziness Laziness As it turns out, it’s very easy to express infinite sequences, just by defining some recursive relations between sequence elements Clojure gives us many functions for infinite sequences, such as iterate ;; infinite stream of ascending numbers from zero (iterate inc 0) ;; to avoid blocking the consuming thread, use take (take 10 (iterate inc 0)) To be able to express such infinite sequences, we need to express laziness In fact, most Clojure core functions (for example map) are defined as lazy so they can consume and produce lazy sequences Jan Herich Collections in Clojure 2014-03-19 Mon 13 / 23
  • 14. Sequence abstraction and laziness How to express laziness in Clojure ;; define fibonacci number as lazy sequence with ;; the help of lazy-seq macro (defn fib [a b] (cons a (lazy-seq (fib b (+ a b))))) ;; consume first ten numbers from sequence (take 10 (fib 0 1)) ;; map is lazy as well (take 10 (map (fn [x] (* 3 x)) (fib 0 1))) Jan Herich Collections in Clojure 2014-03-19 Mon 14 / 23
  • 15. Reducers - better performance and parallelism Reducers, or another useful collection abstraction Why another abstraction if we already have sequences ? 1 Laziness is great when we need it, but not always 2 Sequence is fundamentally serial 3 Those two points are problems if we want high-performing solution which can easily exploit parallelism Therefore, we need to find some new notion of collection, even simpler one than sequence abstraction The new, minimalist notion of collection is something which is reducible Jan Herich Collections in Clojure 2014-03-19 Mon 15 / 23
  • 16. Reducers - better performance and parallelism How is reducible defined It’s important to understand the reduce function: ;; this is a simplified definition of reduce (defn reduce [f init coll] (if-let [s (seq coll)] (reduce f (f init (first s)) (rest s)) init)) ;; this is how we call reduce with reducing function (reduce (fn [accumulator item] (* accumulator item)) 1 ’(1 2 3 4 5 6 7)) Reducible is something which can reduce itself, and we are not interested in actual mechanism Jan Herich Collections in Clojure 2014-03-19 Mon 16 / 23
  • 17. Reducers - better performance and parallelism Digging deeper into reducers Reducers are about transformation of reducing functions ;; new simplified definition of map (defn mapping [f] (fn [f1] (fn [accumulator item] (f1 accumulator (f item))))) Reducers library offer alternatives to sequence functions defined similar to mapping above => as a higher order functions which transform the reducing step to include the logic of mapping, filtering, etc What’s particularly nice, is that those functions consist only of the core logic of their operations Jan Herich Collections in Clojure 2014-03-19 Mon 17 / 23
  • 18. Reducers - better performance and parallelism Applying reducers If we keep the definition of mapping from previous slide, our code would be little strange ;; our sequence based code (reduce + 0 (map (fn [x] (* x 3)) ’(1 2 3))) ;; and equivalent reducers based code (reduce ((mapping (fn [x] (* x 3))) +) 0 ’(1 2 3)) Luckily, we are in a LISP land, so reducers library handles such details with the help of macros and we are working with functions which have the same shape as before ;; require reducers library (require ’[clojure.core.reducers :as r]) ;; use it (reduce + 0 (r/map (fn [x] (* x 3)) 0 ’(1 2 3))) Jan Herich Collections in Clojure 2014-03-19 Mon 18 / 23
  • 19. Reducers - better performance and parallelism What we gain and what we loose Reducers are faster and more memory efficient then their sequence based counterparts, specially when more transformations are chained (have a look at slide 12), because no intermediate sequences are produced This is because composing reducers functions merely creates a recipe for future reduction, no work is done until reduce is called We loose laziness in the process, so we can’t write this expression with reducers anymore (take 10 (r/map (fn [x] (* 3 x)) (fib 0 1))) (compiler will complain, because unlike normal map, r/map doesn’t return a sequence) Jan Herich Collections in Clojure 2014-03-19 Mon 19 / 23
  • 20. Reducers - better performance and parallelism Enter parallelism With reducers, core collection operations are freed from laziness and representation, but we are stuck with reduce function which is serial as well But we can parallelize reduction by using independent sub-reductions and combining their results There is a function which does just that: fold fold takes an combining function, reducing function and collection and returns the result of combining the results of reducing sub-segments of the collection, potentially in parallel Jan Herich Collections in Clojure 2014-03-19 Mon 20 / 23
  • 21. Reducers - better performance and parallelism Fold example (require ’[clojure.core.reducers :as r]) ;; we use the same combine and reduce function (r/fold + + [1 2 3 4 5 6]) ;; when this is the case, it’s enough to supply ;; just reducing function and fold will use it ;; to combine the the sub-reductions (r/fold + [1 2 3 4 5 6]) Jan Herich Collections in Clojure 2014-03-19 Mon 21 / 23
  • 22. Reducers - better performance and parallelism Conclusion Fold will take advantage of collections which are amenable to parallel subdivision, ideal candidates are trees, such as Clojure vectors and maps Parallel implementations of fold for those collections are based upon Java ForkJoin framework If the underlying collection is not suited for parallel subdivision (as is the case with sequence), fold just devolves into reduce Jan Herich Collections in Clojure 2014-03-19 Mon 22 / 23
  • 23. Reducers - better performance and parallelism The End Thank you for your attention I hope this presentation sparked your interest in Clojure, in which case, visit www.clojure.org and learn more ! Jan Herich Collections in Clojure 2014-03-19 Mon 23 / 23