Your SlideShare is downloading. ×
Clojure Reducers / clj-syd Aug 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Clojure Reducers / clj-syd Aug 2012

2,505
views

Published on

Talk given at the Sydney Clojure User group, August 2012

Talk given at the Sydney Clojure User group, August 2012

Published in: Technology, Education

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,505
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
43
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Reducers A library and model for collection processing in Clojure Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  • 2. Reducers A library and model for collection processing in Clojure less or m i ns in 20 ... Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  • 3. Reducers huh? Here’s the gistThursday, 30 August 12
  • 4. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filterThursday, 30 August 12
  • 5. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done!Thursday, 30 August 12
  • 6. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done! and well under my 20 min limit :)Thursday, 30 August 12
  • 7. Alright, alright I’m kiddingThursday, 30 August 12
  • 8. How do reducers make parallelism possible?Thursday, 30 August 12
  • 9. How do reducers make parallelism possible? • JVM’s Fork/Join framework • Reduction TransformersThursday, 30 August 12
  • 10. Before we start - this is bleeding edge stuff Java requirements • Fork/Join framework • Java 7 [1] or • Java 6 + the JSR166 jar [2] Clojure requirements • 1.5.0-* (this is still MASTER on github [3] as of 30/08/2012) [1] - http://jdk7.java.net/ [2] - http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar [3] - https://github.com/clojure/clojureThursday, 30 August 12
  • 11. The Fork/Join FrameworkThursday, 30 August 12
  • 12. The Fork/Join Framework •Based on divide and conquerThursday, 30 August 12
  • 13. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithmThursday, 30 August 12
  • 14. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues.Thursday, 30 August 12
  • 15. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a thresholdThursday, 30 August 12
  • 16. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its dequeThursday, 30 August 12
  • 17. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joinedThursday, 30 August 12
  • 18. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joined •Idle workers can pop tasks from the deques of workers which fall behindThursday, 30 August 12
  • 19. Text is boringThursday, 30 August 12
  • 20. Fork/Join algorithm - simplified viewThursday, 30 August 12
  • 21. Fork/Join algorithm - simplified view Workload is put in “deques”Thursday, 30 August 12
  • 22. Fork/Join algorithm - simplified view ...and progressively halvedThursday, 30 August 12
  • 23. Fork/Join algorithm - simplified viewThursday, 30 August 12
  • 24. Fork/Join algorithm - simplified view ...up to a configured thresholdThursday, 30 August 12
  • 25. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 26. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 27. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 28. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 29. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 30. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 31. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2Thursday, 30 August 12
  • 32. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 33. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 34. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 35. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 36. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  • 37. Fork/Join algorithm - simplified view Worker 1 Worker 2 Idle workers can “steal” items from other workersThursday, 30 August 12
  • 38. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2Thursday, 30 August 12
  • 39. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 40. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 41. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  • 42. Fork/Join algorithm - simplified view Final result Worker 1 Worker 2Thursday, 30 August 12
  • 43. Let’s talk about ReducersThursday, 30 August 12
  • 44. Let’s talk about Reducers Motivations • Performance • via less allocation • via parallelism (leverage Fork/Join)Thursday, 30 August 12
  • 45. Let’s talk about Reducers Motivations Issues • Performance • Lists and Seqs are sequential • via less allocation • map / filter implies order • via parallelism (leverage Fork/Join)Thursday, 30 August 12
  • 46. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ()))Thursday, 30 August 12
  • 47. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • RecursionThursday, 30 August 12
  • 48. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • OrderThursday, 30 August 12
  • 49. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown)Thursday, 30 August 12
  • 50. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown) • Consumes ListThursday, 30 August 12
  • 51. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown) • Consumes List • Builds ListThursday, 30 August 12
  • 52. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds ListThursday, 30 August 12
  • 53. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) This is what mapping means! • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds ListThursday, 30 August 12
  • 54. Reduction TransformersThursday, 30 August 12
  • 55. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentialityThursday, 30 August 12
  • 56. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothingThursday, 30 August 12
  • 57. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothing • It changes what reduce means to the collection by transforming the reducing functionsThursday, 30 August 12
  • 58. What map is really all about (defn mapping [f] (fn [f1] (fn [result input] (f1 result (f input)))))Thursday, 30 August 12
  • 59. But wait! If map doesn’t consume the list any longer, who does? • reduce does! • Since Clojure 1.4 reduce lets the collection reduce itself (through the CollReduce / CollFold protocols) • Think of what this means for tree-like structures such as vectors • This is key to leveraging the Fork/Join frameworkThursday, 30 August 12
  • 60. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14Thursday, 30 August 12
  • 61. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14 (fn [result input] (+ result (inc input)))Thursday, 30 August 12
  • 62. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5]Thursday, 30 August 12
  • 63. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input)))Thursday, 30 August 12
  • 64. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input))) But it feels awkward to use it in this formThursday, 30 August 12
  • 65. What do we have so far? • Performance has been improved due to less allocations • No intermediary lists need to be built (see Haskell’s StreamFusion [4]) • However reduce is still sequential [4] - http://bit.ly/streamFusionThursday, 30 August 12
  • 66. Enters foldThursday, 30 August 12
  • 67. Enters fold • Takes the sequentiality out or foldl, foldr and reduceThursday, 30 August 12
  • 68. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise)Thursday, 30 August 12
  • 69. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework)Thursday, 30 August 12
  • 70. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collectionThursday, 30 August 12
  • 71. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallelThursday, 30 August 12
  • 72. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce resultsThursday, 30 August 12
  • 73. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce results (defn fold [combinef reducef coll] ...)Thursday, 30 August 12
  • 74. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoidsThursday, 30 August 12
  • 75. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids + (+ 2 3) ; 5 (+) ; 0Thursday, 30 August 12
  • 76. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (defn my-+ ([] 0) ([a b] (+ a b))) (my-+ 2 3) ; 5 (my-+) ; 0Thursday, 30 August 12
  • 77. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (require ‘[clojure.core.reducers :as r]) (def my-+ (r/monoid + (fn [] 0))) (my-+ 2 3) ; 5 (my-+) ; 0Thursday, 30 August 12
  • 78. fold by examples ;; all examples assume the reducers library is available as r (ns reducers-playground.core (:require [clojure.core.reducers :as r]))Thursday, 30 August 12
  • 79. fold by examples: increment all even positive integers up to 10 million and add them all upThursday, 30 August 12
  • 80. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talkThursday, 30 August 12
  • 81. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000)))Thursday, 30 August 12
  • 82. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector))))Thursday, 30 August 12
  • 83. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecsThursday, 30 August 12
  • 84. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector))))Thursday, 30 August 12
  • 85. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecsThursday, 30 August 12
  • 86. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector))))Thursday, 30 August 12
  • 87. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector)))) ;; 130msecsThursday, 30 August 12
  • 88. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 89. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (count-words wiki-dump)) ;; 45 secsThursday, 30 August 12
  • 90. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 91. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) Combining fn (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 92. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called at the leaves to merge the (defn p-count-words [text] partial computations (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 93. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called with no arguments to (defn p-count-words [text] provide a seed value (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 94. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  • 95. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (p-count-words wiki-dump)) ;; 30 secsThursday, 30 August 12
  • 96. fold by examples: Load 100k records into PostgreSQL (def records (into [] (line-seq (BufferedReader. (FileReader. "dump.txt")))))Thursday, 30 August 12
  • 97. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... })))))Thursday, 30 August 12
  • 98. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... }))))) ;; 90 secsThursday, 30 August 12
  • 99. fold by examples: Load 100k records into PostgreSQL in parallel(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)))Thursday, 30 August 12
  • 100. fold by examples: Load 100k records into PostgreSQL in parallel(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)));; 50 secsThursday, 30 August 12
  • 101. When to use itThursday, 30 August 12
  • 102. When to use it • Exploring decision treesThursday, 30 August 12
  • 103. When to use it • Exploring decision trees • Image processingThursday, 30 August 12
  • 104. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators)Thursday, 30 August 12
  • 105. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive programThursday, 30 August 12
  • 106. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive program But the tools are available to anyone so be creative!Thursday, 30 August 12
  • 107. Resources • The Anatomy of a Reducer - http://bit.ly/anatomyReducers • Rich’s announcement post on Reducers - http://bit.ly/reducersANN • Rich Hickey - Reducers - EuroClojure 2012 - http://bit.ly/reducersVideo (this presentation was heavily inspired by this video) • The Source on github - http://bit.ly/reducersCore Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  • 108. Thanks! Questions? Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12